Skip to content

A repository showcasing integration of specialized computer vision tools with language models through MCP.

Notifications You must be signed in to change notification settings

omidsrezai/cv-mcp-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CV MCP Tools

A collection of Model Context Protocol (MCP) servers and services that integrate specialized computer vision capabilities with language models. This repository demonstrates how to build modular CV tools that can be easily composed and orchestrated through MCP.

🔧 Components

MCP Servers

Standalone Services

🚀 Quick Start

Prerequisites

Running MCP Servers

# Object Detection
cd object_detection_mcp
uv run object_detector.py

# OCR + Image Generation  
cd ocr_imagen_mcp
uv run ocr_imagen.py

Running Standalone Services

# Image Generator
docker buildx build -t flux-schnell -f image_generator_server/Dockerfile .
docker run --gpus all -p 6070:6070 flux-schnell

# OCR Server
docker buildx build -t ocr-server -f ocr_server/Dockerfile .
docker run --gpus all -p 6080:6080 -p 6081:6081 ocr-server

🔗 Integration with Claude Desktop

Add to your Claude Desktop configuration:

{
    "mcpServers": {
        "object_detection": {
            "command": "uv",
            "args": ["--directory", "/path/to/object_detection_mcp", "run", "object_detector.py"],
            "env": {
                "YOLO_MODEL_NAME": "yolo11m.pt",
                "YOLO_CONF_THRESHOLD": "0.45",
                "MINIO_URL": "localhost:9000",
                "MINIO_ACCESS_KEY": "your-key",
                "MINIO_SECRET_KEY": "your-secret"
            }
        }
    }
}

📁 Repository Structure

cv-mcp-tools/
├── object_detection_mcp/     # YOLO object detection MCP server
├── ocr_imagen_mcp/          # Combined OCR + image generation MCP
├── image_generator_server/   # Standalone FLUX image generation service
├── ocr_server/              # Standalone OCR service
└── CLAUDE.md                # Development guide for Claude Code

🎯 Use Cases

  • Automated Content Analysis - Object detection and OCR for document processing
  • Iterative Image Generation - Generate images with text validation loops
  • Multi-Modal Workflows - Combine vision and language models for complex tasks
  • Modular CV Pipeline - Mix and match components as needed

📖 Documentation

Each component has its own README with detailed setup instructions:

About

A repository showcasing integration of specialized computer vision tools with language models through MCP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published