CV MCP Tools

A collection of Model Context Protocol (MCP) servers and services that integrate specialized computer vision capabilities with language models. This repository demonstrates how to build modular CV tools that can be easily composed and orchestrated through MCP.

🔧 Components

MCP Servers

Object Detection MCP - YOLO-based object detection with MinIO integration
OCR + Image Generation MCP - Combined OCR and image generation with iterative validation workflows

Standalone Services

Image Generator Server - FLUX.1-schnell diffusion model service
OCR Server - Multi-model OCR service (Qwen-VL, Janus)

🚀 Quick Start

Prerequisites

Python 3.11+
UV package manager
Docker with GPU support
MinIO server (for MCP servers)

Running MCP Servers

# Object Detection
cd object_detection_mcp
uv run object_detector.py

# OCR + Image Generation  
cd ocr_imagen_mcp
uv run ocr_imagen.py

Running Standalone Services

# Image Generator
docker buildx build -t flux-schnell -f image_generator_server/Dockerfile .
docker run --gpus all -p 6070:6070 flux-schnell

# OCR Server
docker buildx build -t ocr-server -f ocr_server/Dockerfile .
docker run --gpus all -p 6080:6080 -p 6081:6081 ocr-server

🔗 Integration with Claude Desktop

Add to your Claude Desktop configuration:

{
    "mcpServers": {
        "object_detection": {
            "command": "uv",
            "args": ["--directory", "/path/to/object_detection_mcp", "run", "object_detector.py"],
            "env": {
                "YOLO_MODEL_NAME": "yolo11m.pt",
                "YOLO_CONF_THRESHOLD": "0.45",
                "MINIO_URL": "localhost:9000",
                "MINIO_ACCESS_KEY": "your-key",
                "MINIO_SECRET_KEY": "your-secret"
            }
        }
    }
}

📁 Repository Structure

cv-mcp-tools/
├── object_detection_mcp/     # YOLO object detection MCP server
├── ocr_imagen_mcp/          # Combined OCR + image generation MCP
├── image_generator_server/   # Standalone FLUX image generation service
├── ocr_server/              # Standalone OCR service
└── CLAUDE.md                # Development guide for Claude Code

🎯 Use Cases

Automated Content Analysis - Object detection and OCR for document processing
Iterative Image Generation - Generate images with text validation loops
Multi-Modal Workflows - Combine vision and language models for complex tasks
Modular CV Pipeline - Mix and match components as needed

📖 Documentation

Each component has its own README with detailed setup instructions:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
image_generator_server		image_generator_server
object_detection_mcp		object_detection_mcp
ocr_imagen_mcp		ocr_imagen_mcp
ocr_server		ocr_server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CV MCP Tools

🔧 Components

MCP Servers

Standalone Services

🚀 Quick Start

Prerequisites

Running MCP Servers

Running Standalone Services

🔗 Integration with Claude Desktop

📁 Repository Structure

🎯 Use Cases

📖 Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

omidsrezai/cv-mcp-tools

Folders and files

Latest commit

History

Repository files navigation

CV MCP Tools

🔧 Components

MCP Servers

Standalone Services

🚀 Quick Start

Prerequisites

Running MCP Servers

Running Standalone Services

🔗 Integration with Claude Desktop

📁 Repository Structure

🎯 Use Cases

📖 Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages