fix: resolve stdio transport race condition for concurrent tool calls #529

ezynda3 · 2025-08-05T02:42:47Z

Description

This PR fixes a critical race condition in the stdio transport that occurs when multiple concurrent tool calls are made. The issue causes "bad file descriptor" errors and permanent stdin corruption after 2-4 concurrent calls.

Fixes #528

Type of Change

Bug fix (non-breaking change that fixes an issue)
Tests only (no functional changes)

Checklist

My code follows the code style of this project
I have performed a self-review of my own code
I have added tests that prove my fix is effective or that my feature works
I have updated the documentation accordingly

Additional Information

Root Cause

The stdio transport was spawning a new goroutine for each tools/call request to avoid blocking on sampling requests. However, multiple goroutines were accessing the same bufio.Reader on stdin, which is not thread-safe. This led to corrupted reads and permanent stdin failure.

Solution

Replaced the goroutine-per-request pattern with a worker pool architecture:

Fixed pool of worker goroutines (default: 5, configurable)
Buffered queue for tool calls (default: 100 capacity, configurable)
Mutex protection for concurrent writes
Graceful shutdown with proper cleanup

Testing

Added comprehensive test for concurrent tool calls
All existing tests pass
Race detector confirms no race conditions
Maintains full backward compatibility

🤖 Generated with opencode

Summary by CodeRabbit

New Features
- Improved concurrent processing of "tools/call" requests with a configurable worker pool and queue limits.
- Added options to customize worker thread count and queue size for tool call handling.
Bug Fixes
- Enhanced output handling to prevent issues from simultaneous writes during concurrent processing.
Tests
- Added tests verifying concurrent tool call handling, response correctness, and configuration option boundaries.

Fixes #528 by replacing goroutine-per-request pattern with a worker pool to prevent multiple goroutines from accessing the non-thread-safe bufio.Reader. - Add worker pool pattern with configurable size (default: 5 workers) - Add buffered queue for tool calls (default: 100 capacity) - Protect concurrent writes with mutex - Maintain backward compatibility while fixing the race condition - Add comprehensive test for concurrent tool calls 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

coderabbitai · 2025-08-05T02:42:53Z

Walkthrough

A worker pool and bounded queue were introduced in the StdioServer to handle "tools/call" JSON-RPC messages concurrently, replacing unbounded goroutine spawning. Output writes are now serialized with a mutex for thread safety. Tests were added to verify correct concurrent tool call handling and response integrity.

Changes

Cohort / File(s)	Change Summary
StdioServer concurrency and worker pool `server/stdio.go`	Added a worker pool and bounded queue for "tools/call" message processing; introduced concurrency-safe output serialization with a mutex; new configuration options for pool and queue size; refactored message processing and server lifecycle for safe concurrent handling.
Concurrent tool call and configuration tests `server/stdio_test.go`	Added tests to validate concurrent tool call handling, ensuring correct response count, error absence, and per-tool invocation under concurrent load; added tests for configuration option bounds on worker pool and queue sizes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Eliminate race condition from concurrent goroutine use on bufio.Reader (`server/stdio.go`) (#528)	✅
Ensure only one goroutine reads from stdin at a time (#528)	✅
Serialize output writes to prevent concurrent write races (#528)	✅
Add tests to verify concurrent tool call handling and absence of race conditions (#528)	✅

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes detected.

Suggested reviewers

rwjblue-glean
pottekkat
dugenkui03

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee38c4 and f8ffa62.

📒 Files selected for processing (1)

server/stdio_test.go (2 hunks)

🧰 Additional context used

🧠 Learnings (9)

📓 Common learnings

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

📚 Learning: in the mcptest package, prefer returning errors from helper functions rather than calling t.fatalf()...

Learnt from: octo
PR: mark3labs/mcp-go#149
File: mcptest/mcptest.go:0-0
Timestamp: 2025-04-21T21:26:32.945Z
Learning: In the mcptest package, prefer returning errors from helper functions rather than calling t.Fatalf() directly, giving callers flexibility in how to handle errors.

Applied to files:

server/stdio_test.go

📚 Learning: in go mcp server, servertool.tool field is only used for tool listing and indexing, not for tool exe...

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

Applied to files:

server/stdio_test.go

📚 Learning: in the mark3labs/mcp-go project, the mcpserver.capabilities field is a struct value (servercapabilit...

Learnt from: ezynda3
PR: mark3labs/mcp-go#461
File: server/sampling.go:22-26
Timestamp: 2025-06-30T07:13:17.052Z
Learning: In the mark3labs/mcp-go project, the MCPServer.capabilities field is a struct value (serverCapabilities), not a pointer, so it cannot be nil and doesn't require nil checking. Only pointer fields within the capabilities struct should be checked for nil.

Applied to files:

server/stdio_test.go

📚 Learning: the tool struct in the mark3labs/mcp-go project should handle both inputschema and rawinputschema co...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:0-0
Timestamp: 2025-03-04T07:00:57.111Z
Learning: The Tool struct in the mark3labs/mcp-go project should handle both InputSchema and RawInputSchema consistently between MarshalJSON and UnmarshalJSON methods, even though the tools response from MCP server typically doesn't contain rawInputSchema.

Applied to files:

server/stdio_test.go

📚 Learning: tool responses from the mcp server shouldn't contain rawinputschema, which is why the unmarshaljson ...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:107-137
Timestamp: 2025-03-04T06:59:43.882Z
Learning: Tool responses from the MCP server shouldn't contain RawInputSchema, which is why the UnmarshalJSON method for the Tool struct is implemented to handle only the structured InputSchema format.

Applied to files:

server/stdio_test.go

📚 Learning: the tool struct in mark3labs/mcp-go handles both inputschema and rawinputschema formats. when unmars...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:0-0
Timestamp: 2025-03-04T07:00:57.111Z
Learning: The Tool struct in mark3labs/mcp-go handles both InputSchema and RawInputSchema formats. When unmarshaling JSON, it first tries to parse into a structured ToolInputSchema format, and if that fails, it falls back to using the raw schema format, providing symmetry with the MarshalJSON method.

Applied to files:

server/stdio_test.go

📚 Learning: in the mcp-go library, the getpromptparams.arguments field is of type map[string]string, not map[str...

Learnt from: lariel-fernandes
PR: mark3labs/mcp-go#428
File: www/docs/pages/servers/prompts.mdx:218-234
Timestamp: 2025-06-20T20:39:51.870Z
Learning: In the mcp-go library, the GetPromptParams.Arguments field is of type map[string]string, not map[string]interface{}, so direct string access without type assertions is safe and correct.

Applied to files:

server/stdio_test.go

📚 Learning: in mcp-go project, the maintainer prefers keeping builder pattern apis simple without excessive vali...

Learnt from: davidleitw
PR: mark3labs/mcp-go#451
File: mcp/tools.go:1192-1217
Timestamp: 2025-06-26T09:38:18.629Z
Learning: In mcp-go project, the maintainer prefers keeping builder pattern APIs simple without excessive validation for edge cases. The WithOutput* functions are designed to assume correct usage rather than defensive programming, following the principle of API simplicity over comprehensive validation.

Applied to files:

server/stdio_test.go

🔇 Additional comments (8)

server/stdio_test.go (8)

7-7: LGTM! Import additions are justified.

The new imports (fmt, sync, time) are necessary for the concurrent tool call testing functionality.

Also applies to: 11-11, 13-13

274-303: LGTM! Well-designed concurrent test setup.

The test properly sets up concurrent tool execution with:

Thread-safe call tracking using sync.Map

Multiple tools to distribute load

Proper closure capture for tool names

Realistic work simulation

387-400: LGTM! Improved timeout and control flow.

The timeout has been appropriately reduced to 2 seconds and the control flow now uses idiomatic labeled break instead of goto.

406-422: LGTM! Comprehensive response validation.

The response validation properly:

Verifies all expected responses were received

Checks for errors in responses

Handles JSON unmarshaling with proper error checking

Validates response structure

424-432: LGTM! Proper validation of tool execution distribution.

The call count validation correctly uses sync.Map.Range to verify all tools were invoked during concurrent execution.

448-459: LGTM! Correct bounds testing for worker pool size.

The test now properly expects the maximum value (100) when exceeding bounds and correctly validates in-range values.

461-474: LGTM! Correct bounds testing for queue size.

The test now properly expects the maximum value (10000) when exceeding bounds and correctly validates in-range values.

475-485: LGTM! Proper edge case testing for invalid inputs.

The test correctly verifies that zero and negative values fall back to appropriate defaults (worker pool size 5, queue size 100).

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-stdio-race-condition

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

- Add error checking for Write operations in concurrent tool calls test - Ensures all error return values are properly handled 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

server/stdio.go (2)

68-83: Consider adding upper bounds validation

While the options check for positive values, consider adding upper bounds to prevent excessive resource usage.

 // WithWorkerPoolSize sets the number of workers for processing tool calls
 func WithWorkerPoolSize(size int) StdioOption {
 	return func(s *StdioServer) {
-		if size > 0 {
+		if size > 0 && size <= 100 {
 			s.workerPoolSize = size
+		} else if size > 100 {
+			s.errLogger.Printf("Worker pool size %d exceeds maximum (100), using default", size)
 		}
 	}
 }

 // WithQueueSize sets the size of the tool call queue
 func WithQueueSize(size int) StdioOption {
 	return func(s *StdioServer) {
-		if size > 0 {
+		if size > 0 && size <= 10000 {
 			s.queueSize = size
+		} else if size > 10000 {
+			s.errLogger.Printf("Queue size %d exceeds maximum (10000), using default", size)
 		}
 	}
 }

319-340: Consider adding context to write errors

The worker correctly processes tool calls but could provide more context when write errors occur.

 			// Process the tool call
 			response := s.server.HandleMessage(work.ctx, work.message)
 			if response != nil {
 				if err := s.writeResponse(response, work.writer); err != nil {
-					s.errLogger.Printf("Error writing tool response: %v", err)
+					// Extract method name for better error context
+					var req struct {
+						Method string `json:"method"`
+						ID     any    `json:"id"`
+					}
+					json.Unmarshal(work.message, &req)
+					s.errLogger.Printf("Error writing response for %s (id: %v): %v", req.Method, req.ID, err)
 				}
 			}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6da5cd1 and a6bb31e.

📒 Files selected for processing (2)

server/stdio.go (8 hunks)
server/stdio_test.go (2 hunks)

🧰 Additional context used

🧠 Learnings (8)

📓 Common learnings

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

📚 Learning: in go mcp server, servertool.tool field is only used for tool listing and indexing, not for tool exe...

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

Applied to files:

server/stdio_test.go
server/stdio.go

📚 Learning: in the mcptest package, prefer returning errors from helper functions rather than calling t.fatalf()...

Learnt from: octo
PR: mark3labs/mcp-go#149
File: mcptest/mcptest.go:0-0
Timestamp: 2025-04-21T21:26:32.945Z
Learning: In the mcptest package, prefer returning errors from helper functions rather than calling t.Fatalf() directly, giving callers flexibility in how to handle errors.

Applied to files:

server/stdio_test.go

📚 Learning: in the mark3labs/mcp-go project, the mcpserver.capabilities field is a struct value (servercapabilit...

Learnt from: ezynda3
PR: mark3labs/mcp-go#461
File: server/sampling.go:22-26
Timestamp: 2025-06-30T07:13:17.052Z
Learning: In the mark3labs/mcp-go project, the MCPServer.capabilities field is a struct value (serverCapabilities), not a pointer, so it cannot be nil and doesn't require nil checking. Only pointer fields within the capabilities struct should be checked for nil.

Applied to files:

server/stdio_test.go
server/stdio.go

📚 Learning: in mcp-go project, the maintainer prefers keeping builder pattern apis simple without excessive vali...

Learnt from: davidleitw
PR: mark3labs/mcp-go#451
File: mcp/tools.go:1192-1217
Timestamp: 2025-06-26T09:38:18.629Z
Learning: In mcp-go project, the maintainer prefers keeping builder pattern APIs simple without excessive validation for edge cases. The WithOutput* functions are designed to assume correct usage rather than defensive programming, following the principle of API simplicity over comprehensive validation.

Applied to files:

server/stdio_test.go

📚 Learning: in the mcp-go library, the getpromptparams.arguments field is of type map[string]string, not map[str...

Learnt from: lariel-fernandes
PR: mark3labs/mcp-go#428
File: www/docs/pages/servers/prompts.mdx:218-234
Timestamp: 2025-06-20T20:39:51.870Z
Learning: In the mcp-go library, the GetPromptParams.Arguments field is of type map[string]string, not map[string]interface{}, so direct string access without type assertions is safe and correct.

Applied to files:

server/stdio_test.go

📚 Learning: the tool struct in the mark3labs/mcp-go project should handle both inputschema and rawinputschema co...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:0-0
Timestamp: 2025-03-04T07:00:57.111Z
Learning: The Tool struct in the mark3labs/mcp-go project should handle both InputSchema and RawInputSchema consistently between MarshalJSON and UnmarshalJSON methods, even though the tools response from MCP server typically doesn't contain rawInputSchema.

Applied to files:

server/stdio_test.go

📚 Learning: tool responses from the mcp server shouldn't contain rawinputschema, which is why the unmarshaljson ...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:107-137
Timestamp: 2025-03-04T06:59:43.882Z
Learning: Tool responses from the MCP server shouldn't contain RawInputSchema, which is why the UnmarshalJSON method for the Tool struct is implemented to handle only the structured InputSchema format.

Applied to files:

server/stdio_test.go
server/stdio.go

🔇 Additional comments (5)

server/stdio_test.go (1)

280-297: Remove redundant synchronization

The sync.Map is already thread-safe for concurrent operations. The additional mutex is unnecessary and adds complexity.

-		// Track tool call executions
-		var callCount sync.Map
-		var mu sync.Mutex
+		// Track tool call executions (sync.Map is already thread-safe)
+		var callCount sync.Map

 		// Create server with test tools
 		mcpServer := NewMCPServer("test", "1.0.0")

 		// Add multiple tools that simulate work and track concurrent execution
 		for i := 0; i < 5; i++ {
 			toolName := fmt.Sprintf("test_tool_%d", i)
 			mcpServer.AddTool(
 				mcp.NewTool(toolName),
 				func(name string) func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
 					return func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
 						// Track concurrent executions
-						mu.Lock()
 						count, _ := callCount.LoadOrStore(name, 0)
 						callCount.Store(name, count.(int)+1)
-						mu.Unlock()

⛔ Skipped due to learnings

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

Learnt from: octo
PR: mark3labs/mcp-go#149
File: mcptest/mcptest.go:0-0
Timestamp: 2025-04-21T21:26:32.945Z
Learning: In the mcptest package, prefer returning errors from helper functions rather than calling t.Fatalf() directly, giving callers flexibility in how to handle errors.

Learnt from: ezynda3
PR: mark3labs/mcp-go#461
File: server/sampling.go:22-26
Timestamp: 2025-06-30T07:13:17.052Z
Learning: In the mark3labs/mcp-go project, the MCPServer.capabilities field is a struct value (serverCapabilities), not a pointer, so it cannot be nil and doesn't require nil checking. Only pointer fields within the capabilities struct should be checked for nil.

Learnt from: tylergannon
PR: mark3labs/mcp-go#62
File: server/callbacks.go:48-70
Timestamp: 2025-03-21T06:11:18.927Z
Learning: The callbacks in MCPServer are designed to be configured only during initialization via the WithCallbacks option and not modified during operation, which eliminates the need for concurrency protection.

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:107-137
Timestamp: 2025-03-04T06:59:43.882Z
Learning: Tool responses from the MCP server shouldn't contain RawInputSchema, which is why the UnmarshalJSON method for the Tool struct is implemented to handle only the structured InputSchema format.

server/stdio.go (4)

33-39: LGTM! Well-structured concurrency control

The addition of worker pool, bounded queue, and write mutex properly addresses the race condition issue. The design with configurable pool and queue sizes provides good flexibility.

376-412: Excellent implementation of graceful shutdown

The initialization of the queue, worker pool startup, and graceful shutdown sequence properly ensures all pending work is completed before exit. This correctly addresses the race condition issue.

549-558: Write synchronization properly implemented

The mutex protection for concurrent writes correctly prevents the race condition that was causing "bad file descriptor" errors.

456-457: Use error-level logging for queue overflow

Queue overflow is an important operational issue that should be logged at error level, not just Printf.

 		default:
 			// Queue is full, process synchronously as fallback
-			s.errLogger.Printf("Tool call queue full, processing synchronously")
+			s.errLogger.Printf("ERROR: Tool call queue full (capacity: %d), falling back to synchronous processing", s.queueSize)

Likely an incorrect or invalid review comment.

server/stdio_test.go

sync.Map is already thread-safe and doesn't require additional mutex protection. This simplifies the test code while maintaining the same functionality. 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

…dation - Replace goto statements with labeled break for more idiomatic Go - Reduce test timeout from 5s to 2s for faster test execution - Add upper bounds validation for worker pool size (max 100) and queue size (max 10000) - Add tests to verify configuration bounds are respected - Log warnings when configuration values exceed maximum limits Addresses remaining CodeRabbit review feedback. 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

dugenkui03 · 2025-08-06T02:49:36Z

server/stdio.go

+			s.workerPoolSize = size
+		} else if size > 100 {
+			s.errLogger.Printf("Worker pool size %d exceeds maximum (100), using default: %d", size, s.workerPoolSize)
+		}


When the size exceeds the maximum limit, falling back to the default value (100) is not appropriate, as the user clearly expects a larger queue size. maximum seems to be a better choice.

Fixed: 0ee38c4

dugenkui03 · 2025-08-06T02:56:32Z

server/stdio.go

+			s.queueSize = size
+		} else if size > 10000 {
+			s.errLogger.Printf("Queue size %d exceeds maximum (10000), using default: %d", size, s.queueSize)
+		}


Using maximum value would be better (instead of default value)

Fixed: 0ee38c4

When WithWorkerPoolSize or WithQueueSize receive values exceeding their maximum limits, use the maximum allowed value instead of falling back to the default. This better aligns with user expectations when they request large pool/queue sizes. 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

ezynda3

Fixed: 0ee38c4

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

server/stdio_test.go (2)

337-344: Fix error handling in initialization.

The code ignores errors from JSON marshaling and doesn't properly handle scanner failures during initialization.

Apply this diff to add proper error handling:

-		requestBytes, _ := json.Marshal(initRequest)
-		if _, err := stdinWriter.Write(append(requestBytes, '\n')); err != nil {
-			t.Fatalf("Failed to write init request: %v", err)
-		}
+		requestBytes, err := json.Marshal(initRequest)
+		if err != nil {
+			t.Fatalf("Failed to marshal init request: %v", err)
+		}
+		if _, err := stdinWriter.Write(append(requestBytes, '\n')); err != nil {
+			t.Fatalf("Failed to write init request: %v", err)
+		}

 		// Read init response
 		scanner := bufio.NewScanner(stdoutReader)
-		scanner.Scan()
+		if !scanner.Scan() {
+			t.Fatal("Failed to read init response")
+		}

365-368: Add error handling in tool request goroutine.

The goroutine sending tool requests ignores JSON marshaling errors, which could lead to silent test failures.

Apply this diff to add proper error handling:

-				requestBytes, _ := json.Marshal(toolRequest)
-				if _, err := stdinWriter.Write(append(requestBytes, '\n')); err != nil {
-					t.Errorf("Failed to write tool request %d: %v", id, err)
-				}
+				requestBytes, err := json.Marshal(toolRequest)
+				if err != nil {
+					t.Errorf("Failed to marshal tool request %d: %v", id, err)
+					return
+				}
+				if _, err := stdinWriter.Write(append(requestBytes, '\n')); err != nil {
+					t.Errorf("Failed to write tool request %d: %v", id, err)
+					return
+				}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47b4408 and 0ee38c4.

📒 Files selected for processing (2)

server/stdio.go (8 hunks)
server/stdio_test.go (2 hunks)

🧰 Additional context used

🧠 Learnings (9)

📓 Common learnings

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

📚 Learning: in go mcp server, servertool.tool field is only used for tool listing and indexing, not for tool exe...

Learnt from: floatingIce91
PR: mark3labs/mcp-go#401
File: server/server.go:1082-1092
Timestamp: 2025-06-23T11:10:42.948Z
Learning: In Go MCP server, ServerTool.Tool field is only used for tool listing and indexing, not for tool execution or middleware. During handleToolCall, only the Handler field is used, so dynamic tools don't need the Tool field populated.

Applied to files:

server/stdio.go
server/stdio_test.go

📚 Learning: in the mark3labs/mcp-go project, the mcpserver.capabilities field is a struct value (servercapabilit...

Learnt from: ezynda3
PR: mark3labs/mcp-go#461
File: server/sampling.go:22-26
Timestamp: 2025-06-30T07:13:17.052Z
Learning: In the mark3labs/mcp-go project, the MCPServer.capabilities field is a struct value (serverCapabilities), not a pointer, so it cannot be nil and doesn't require nil checking. Only pointer fields within the capabilities struct should be checked for nil.

Applied to files:

server/stdio.go
server/stdio_test.go

📚 Learning: tool responses from the mcp server shouldn't contain rawinputschema, which is why the unmarshaljson ...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:107-137
Timestamp: 2025-03-04T06:59:43.882Z
Learning: Tool responses from the MCP server shouldn't contain RawInputSchema, which is why the UnmarshalJSON method for the Tool struct is implemented to handle only the structured InputSchema format.

Applied to files:

server/stdio.go
server/stdio_test.go

📚 Learning: in the mcptest package, prefer returning errors from helper functions rather than calling t.fatalf()...

Learnt from: octo
PR: mark3labs/mcp-go#149
File: mcptest/mcptest.go:0-0
Timestamp: 2025-04-21T21:26:32.945Z
Learning: In the mcptest package, prefer returning errors from helper functions rather than calling t.Fatalf() directly, giving callers flexibility in how to handle errors.

Applied to files:

server/stdio_test.go

📚 Learning: the tool struct in the mark3labs/mcp-go project should handle both inputschema and rawinputschema co...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:0-0
Timestamp: 2025-03-04T07:00:57.111Z
Learning: The Tool struct in the mark3labs/mcp-go project should handle both InputSchema and RawInputSchema consistently between MarshalJSON and UnmarshalJSON methods, even though the tools response from MCP server typically doesn't contain rawInputSchema.

Applied to files:

server/stdio_test.go

📚 Learning: the tool struct in mark3labs/mcp-go handles both inputschema and rawinputschema formats. when unmars...

Learnt from: xinwo
PR: mark3labs/mcp-go#35
File: mcp/tools.go:0-0
Timestamp: 2025-03-04T07:00:57.111Z
Learning: The Tool struct in mark3labs/mcp-go handles both InputSchema and RawInputSchema formats. When unmarshaling JSON, it first tries to parse into a structured ToolInputSchema format, and if that fails, it falls back to using the raw schema format, providing symmetry with the MarshalJSON method.

Applied to files:

server/stdio_test.go

📚 Learning: in mcp-go project, the maintainer prefers keeping builder pattern apis simple without excessive vali...

Learnt from: davidleitw
PR: mark3labs/mcp-go#451
File: mcp/tools.go:1192-1217
Timestamp: 2025-06-26T09:38:18.629Z
Learning: In mcp-go project, the maintainer prefers keeping builder pattern APIs simple without excessive validation for edge cases. The WithOutput* functions are designed to assume correct usage rather than defensive programming, following the principle of API simplicity over comprehensive validation.

Applied to files:

server/stdio_test.go

📚 Learning: in the mcp-go library, the getpromptparams.arguments field is of type map[string]string, not map[str...

Learnt from: lariel-fernandes
PR: mark3labs/mcp-go#428
File: www/docs/pages/servers/prompts.mdx:218-234
Timestamp: 2025-06-20T20:39:51.870Z
Learning: In the mcp-go library, the GetPromptParams.Arguments field is of type map[string]string, not map[string]interface{}, so direct string access without type assertions is safe and correct.

Applied to files:

server/stdio_test.go

🧬 Code Graph Analysis (1)

server/stdio_test.go (4)

server/server.go (1)

NewMCPServer (299-325)

mcp/tools.go (2)

NewTool (658-680)

CallToolResult (40-51)

mcp/utils.go (1)

NewToolResultText (245-254)

server/stdio.go (3)

NewStdioServer (253-264)

WithWorkerPoolSize (68-78)

WithQueueSize (81-91)

🪛 GitHub Actions: go

server/stdio_test.go

[error] 451-451: Test failure: Expected default worker pool size 5, got 100

[error] 465-465: Test failure: Expected default queue size 100, got 10000

🔇 Additional comments (13)

server/stdio_test.go (5)

7-13: LGTM: Import additions are appropriate.

The new imports (fmt, sync, time) are correctly added to support the concurrent tool call testing functionality.

286-303: Well-designed concurrent tool setup.

The test creates multiple tools with proper closure capture and thread-safe counting using sync.Map. The simulation of work with time.Sleep effectively tests the concurrent processing capabilities.

389-400: Good use of labeled break for control flow.

The use of collectLoop label with break collectLoop is more idiomatic than the previous goto approach and improves code readability.

455-459: LGTM: Valid configuration values are properly tested.

The test correctly verifies that valid values within bounds are accepted and stored properly.

476-484: LGTM: Zero and negative value handling is properly tested.

The test correctly verifies that invalid values (zero and negative) revert to default values as expected.

server/stdio.go (8)

32-46: Well-designed concurrent architecture.

The addition of worker pool fields (toolCallQueue, workerWg, workerPoolSize, queueSize) and write protection (writeMu) properly addresses the race condition described in the PR objectives. The toolCallWork struct cleanly encapsulates queued work items.

67-78: LGTM: Worker pool size configuration with proper bounds.

The configuration function correctly enforces the maximum limit and uses the maximum value when exceeded, which aligns with user expectations for larger pool sizes.

80-91: LGTM: Queue size configuration with proper bounds.

The configuration function correctly enforces the maximum limit and uses the maximum value when exceeded, which is appropriate for queue sizing.

326-348: Excellent worker implementation with proper cleanup.

The worker function correctly:

Uses defer s.workerWg.Done() for proper WaitGroup cleanup

Handles channel closure with the ok check

Processes work items through the existing HandleMessage infrastructure

Respects context cancellation

This design maintains the existing MCP server functionality while adding concurrency safety.

384-421: Robust worker pool lifecycle management.

The Listen function properly:

Initializes the buffered queue with the configured size

Starts the worker pool before processing messages

Gracefully shuts down workers by closing the queue and waiting for completion

The sequential shutdown approach (close queue → wait for workers → return) ensures all queued work is completed before termination.

452-472: Smart queuing strategy with graceful degradation.

The implementation correctly:

Identifies tool calls that need concurrent processing

Uses non-blocking channel send with select and default case

Falls back to synchronous processing when the queue is full

Logs queue overflow situations for monitoring

This approach prevents blocking while maintaining system stability under high load.

557-567: Essential write protection for concurrent safety.

The addition of writeMu.Lock() protection correctly addresses the concurrent write issues described in the PR objectives. This ensures that multiple workers cannot corrupt the output stream when writing responses simultaneously.

261-262: LGTM: Reasonable default values for worker pool configuration.

The default values (5 workers, 100 queue size) provide a good balance between concurrency and resource usage for typical MCP server workloads.

server/stdio_test.go

Update test expectations to match the new behavior where WithWorkerPoolSize and WithQueueSize use maximum values instead of defaults when the requested size exceeds limits. 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

fix: address golangci-lint errcheck warnings

ebb6e4f

- Add error checking for Write operations in concurrent tool calls test - Ensures all error return values are properly handled 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <noreply@opencode.ai>

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

server/stdio_test.go Outdated Show resolved Hide resolved

server/stdio_test.go Outdated Show resolved Hide resolved

server/stdio_test.go Outdated Show resolved Hide resolved

ezynda3 and others added 2 commits August 5, 2025 05:57

ezynda3 assigned dugenkui03, pottekkat and rwjblue-glean Aug 5, 2025

dugenkui03 reviewed Aug 6, 2025

View reviewed changes

ezynda3 commented Aug 6, 2025

View reviewed changes

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

server/stdio_test.go Outdated Show resolved Hide resolved

server/stdio_test.go Outdated Show resolved Hide resolved

dugenkui03 approved these changes Aug 6, 2025

View reviewed changes

ezynda3 merged commit 9393526 into main Aug 6, 2025
4 checks passed

pottekkat deleted the fix-stdio-race-condition branch August 7, 2025 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve stdio transport race condition for concurrent tool calls #529

fix: resolve stdio transport race condition for concurrent tool calls #529

Uh oh!

ezynda3 commented Aug 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 5, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dugenkui03 Aug 6, 2025

Uh oh!

ezynda3 Aug 6, 2025

Uh oh!

dugenkui03 Aug 6, 2025

Uh oh!

ezynda3 Aug 6, 2025

Uh oh!

ezynda3 left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: resolve stdio transport race condition for concurrent tool calls #529

fix: resolve stdio transport race condition for concurrent tool calls #529

Uh oh!

Conversation

ezynda3 commented Aug 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Checklist

Additional Information

Root Cause

Solution

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dugenkui03 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

ezynda3 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

dugenkui03 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

ezynda3 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

ezynda3 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezynda3 commented Aug 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 5, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)