An MCP (Model Context Protocol) server that implements a cognitive reasoning tool inspired by Google DeepMind's Inner Monologue research. This server enables Large Language Models to engage in private, structured self-reflection and multi-step reasoning before generating responses - simulating the natural human process of "thinking before speaking." By providing a silent workspace for internal reasoning, debugging logic, and approach verification, it significantly improves response quality, reduces errors, and enhances problem-solving capabilities across complex coding, mathematical, and analytical tasks.
This project is inspired by the research paper "Inner Monologue: Embodied Reasoning through Planning with Language Models" by Robotics at Google (now Google DeepMind). The paper demonstrates how LLMs can leverage environment feedback to form an inner monologue that allows them to more richly process and plan in complex scenarios.
While the original research focused on robotic control and embodied AI, this MCP server adapts the concept for general-purpose LLM reasoning, providing a tool that allows language models to:
- Process internal thoughts privately before responding
- Break down complex problems step-by-step
- Reflect on potential approaches and outcomes
- Engage in structured self-reasoning and verification
- ๐คซ Silent Processing: Thoughts are processed internally without affecting external output
- ๐งฉ Structured Reasoning: Supports complex, multi-step reasoning processes
- ๐ Flexible Input: Accepts any form of internal reasoning, from debugging to mathematical problem-solving
- ๐ MCP Integration: Seamlessly integrates with Claude and other MCP-compatible clients
The inner monologue tool provides significant benefits for LLM reasoning quality and efficiency:
- Enhanced Problem Decomposition: Breaking complex problems into manageable steps
- Error Prevention: Catching logical inconsistencies before generating responses
- Solution Verification: Testing approaches mentally before implementation
- Context Retention: Maintaining reasoning chains across multi-step problems
- Reduced Response Iterations: Fewer back-and-forth corrections needed
- Improved Accuracy: Higher quality initial responses through internal verification
- Better Planning: More structured approach to complex tasks
- Memory Optimization: Efficient use of context window for reasoning
Scenario | Without Inner Monologue | With Inner Monologue | Improvement |
---|---|---|---|
Code Debugging | 3-4 iterations | 1-2 iterations | ~50% faster |
Mathematical Problems | 60% accuracy | 85% accuracy | +25% accuracy |
Complex Planning | Basic structure | Detailed breakdown | +40% completeness |
Install the package globally via npm:
npm install -g inner-monologue-mcp
Add the server to your Claude Desktop configuration file:
MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"inner-monologue": {
"command": "npx",
"args": ["inner-monologue-mcp"]
}
}
}
- Open Cursor Settings (Cmd/Ctrl + ,)
- Navigate to "Extensions" โ "MCP Servers"
- Add a new server configuration:
{
"name": "inner-monologue",
"command": "npx",
"args": ["inner-monologue-mcp"]
}
Install the MCP extension for VS Code, then add to your settings.json:
{
"mcp.servers": {
"inner-monologue": {
"command": "npx",
"args": ["inner-monologue-mcp"],
"env": {}
}
}
}
A tool for internal reasoning and reflection that processes thoughts without producing visible output.
Parameters:
thought
(string): A line of reasoning, mental check, intuition breakdown, or problem-solving step
Use Cases:
- Debugging complex logic by walking through step-by-step reasoning
- Mathematical problem-solving with intermediate steps
- Evaluating multiple approaches before committing to a solution
- Reflecting on potential edge cases or failure modes
- Planning complex tasks by breaking them into components
{
"name": "inner-monologue",
"arguments": {
"thought": "Let me think through this algorithm step by step. The user wants to sort an array, but they mentioned efficiency is crucial. A simple bubble sort would be O(nยฒ) which might be too slow for large datasets. Quick sort averages O(n log n) but has worst-case O(nยฒ). Merge sort guarantees O(n log n) but uses more memory. Given they emphasized efficiency and didn't mention memory constraints, I should probably recommend merge sort or a hybrid approach like Timsort."
}
}
The tool returns a brief confirmation of the thought process:
{
"content": [
{
"type": "text",
"text": "Thought: Let me think through this algorithm step by step..."
}
]
}
Debugging Scenario:
{
"thought": "This function is returning undefined when I expect an object. Let me trace through the execution: the input parameter looks correct, the validation passes, but then... ah, I see the issue. The async function isn't being awaited, so it's returning a Promise instead of the resolved value. I need to add await before the function call."
}
Problem-Solving Scenario:
{
"thought": "The user wants to implement authentication, but they haven't specified their tech stack. I should ask about: 1) Frontend framework (React, Vue, etc.), 2) Backend technology (Node.js, Python, etc.), 3) Database preference, 4) Authentication method (JWT, sessions, OAuth). I'll start with the most common scenario but offer alternatives."
}
{
"name": "inner-monologue",
"arguments": {
"thought": "Looking at this React component, I notice several potential issues: 1) The useEffect has no dependency array, so it runs on every render - that's a performance issue. 2) The state update in the effect could cause an infinite loop. 3) The component isn't handling loading states. 4) No error boundaries for the async operation. I should suggest adding dependencies to useEffect, implementing proper loading/error states, and wrapping async calls in try-catch blocks."
}
}
{
"name": "inner-monologue",
"arguments": {
"thought": "The user is asking about graph traversal. Let me think through the options: DFS uses a stack (or recursion), good for pathfinding and topological sorting, O(V+E) time complexity. BFS uses a queue, better for shortest path in unweighted graphs, also O(V+E). For their specific case - finding all connected components - DFS would be more intuitive and memory efficient. I should explain both but recommend DFS with a clear example."
}
}
{
"name": "inner-monologue",
"arguments": {
"thought": "They want to design a chat application. Key considerations: 1) Real-time messaging (WebSockets vs Server-Sent Events), 2) Message persistence (database choice), 3) User authentication, 4) Scalability (horizontal vs vertical), 5) Message ordering and delivery guarantees. For a simple start: Node.js + Socket.io + MongoDB + JWT auth. For scale: consider message queues (Redis/RabbitMQ), microservices, and CDN for media. I'll present a progressive architecture approach."
}
}
{
"name": "inner-monologue",
"arguments": {
"thought": "This is a calculus optimization problem. I need to: 1) Define the function to optimize, 2) Find the constraints, 3) Take the derivative, 4) Set derivative to zero, 5) Check second derivative for max/min, 6) Verify the solution makes sense in context. The problem asks for maximum area with fixed perimeter - classic problem. Area = length ร width, Perimeter = 2(length + width) = constant. I can substitute to get a single-variable function."
}
}
{
"name": "inner-monologue",
"arguments": {
"thought": "For this REST API design, I need to consider: 1) Resource naming (nouns, not verbs), 2) HTTP methods semantics (GET for retrieval, POST for creation, PUT for updates, DELETE for removal), 3) Status codes (200, 201, 400, 404, 500), 4) Request/response formats (JSON), 5) Authentication (JWT tokens), 6) Rate limiting, 7) Versioning strategy. The user's current design mixes verbs in URLs and uses GET for mutations - I need to address these issues."
}
}
The key philosophical difference lies in perspective and voice. Inner Monologue enables LLMs to engage in first-person internal dialogue - thinking as "I need to consider..." or "Let me think through this..." - simulating authentic human-like self-reflection. In contrast, the Think Tool provides third-person analytical documentation - "The model should analyze..." or "This problem requires..." - creating explicit reasoning records.
Aspect | Inner Monologue | Think Tool |
---|---|---|
Perspective | ๐ง First-person internal voice ("I think...") | ๐ Third-person analysis ("The problem requires...") |
Purpose | ๐คซ Private self-reflection and reasoning | ๐ญ Documented thinking process |
Voice Style | ๐ฃ๏ธ Natural inner dialogue | ๐ฏ Structured analytical language |
Both tools can work together effectively:
- Inner Monologue for initial private reasoning
- Think Tool for documenting final thought process
- Result in well-reasoned, documented responses
npm run build
npm run dev
npm test
- Node.js v18.x or higher
- npm or yarn package manager
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Clone your fork:
git clone https://github.com/your-username/inner-monologue-mcp.git
- Install dependencies:
npm install
- Make your changes
- Build the project:
npm run build
- Test your changes
- Submit a pull request
- Follow the existing code style
- Add tests for new functionality
- Update documentation as needed
- Ensure all tests pass before submitting
This project is licensed under the MIT License - see the LICENSE file for details.
- Original research by Robotics at Google (now Google DeepMind) on Inner Monologue for embodied reasoning
- The Model Context Protocol team for the excellent MCP framework
- The broader AI research community working on reasoning and planning