Skip to content

captaindatatech/captaindata-mcp

Repository files navigation

Captain Data MCP API

A Model Context Protocol (MCP) server for Captain Data tools, designed to work with ChatGPT and other AI assistants. This API provides a clean interface for LinkedIn data extraction and search capabilities through Captain Data's powerful scraping engine.

Features

  • MCP-compliant API for seamless integration with AI assistants
  • Dual authentication system - API key headers or session tokens
  • Comprehensive tool suite for LinkedIn data extraction and search
  • Production-ready with proper error handling, logging, and monitoring
  • TypeScript-first with strict type checking and validation
  • Fastify-powered for high performance and extensibility
  • Redis integration for session management with in-memory fallback
  • Interactive API documentation with OpenAPI/Swagger
  • Comprehensive testing with Jest and integration tests

Authentication

The API supports two authentication methods:

  1. Direct API Key: Set your Captain Data API key in the X-API-Key header
  2. Session Tokens: Use the /auth endpoint to get a session token, then include it in the Authorization: Bearer <token> header

Local Development

Prerequisites

  • Node.js 16+
  • npm or yarn
  • Captain Data API key
  • Redis (optional, falls back to in-memory storage)

Setup

  1. Install dependencies:
npm install
  1. Create environment configuration:
cp env.example .env
# Edit .env with your Captain Data API key and other settings
  1. Start the development server:
npm run dev

The server will be available at http://localhost:3000.

Development Scripts

# Start development server with hot reload
npm run dev

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

# Type checking
npm run type-check

# Build for production
npm run build

# Generate OpenAPI documentation
npm run generate:openapi
npm run generate:openapi:gpt

Production Deployment

Environment Variables

Variable Required Default Description
CD_API_BASE - Captain Data API base URL
NODE_ENV development Environment (development/production/test)
PORT 3000 Server port
LOG_LEVEL info Logging level (error, warn, info, debug)
RATE_LIMIT_MAX 100 Maximum requests per time window
RATE_LIMIT_TIME_WINDOW 1 minute Rate limit time window
API_TIMEOUT 30000 API request timeout in milliseconds
MAX_RETRIES 2 Maximum retry attempts for failed requests
RETRY_DELAY 1000 Delay between retries in milliseconds
REDIS_URL - Redis connection URL for session storage
SENTRY_DSN - Sentry DSN for error tracking

Production Checklist

Before deploying to production, ensure:

  • Environment variables are properly configured
  • All tests pass: npm test
  • Type checking passes: npm run type-check
  • Rate limiting is configured appropriately
  • Logging level is set to info or warn
  • Health check endpoint is monitored
  • Error tracking is set up (Sentry recommended)
  • Redis is configured for session management (optional)

Monitoring & Observability

The API provides comprehensive monitoring capabilities:

  • Health Check: GET /health - Returns system status and uptime
  • Request Logging: All requests are logged with unique request IDs
  • Error Tracking: Structured error responses with detailed context
  • Performance Metrics: Response times and execution metrics
  • Sentry Integration: Automatic error reporting and performance monitoring

API Endpoints

Core Endpoints

  • GET /health - Health check and system status
  • GET /introspect - List available tools (basic mode)
  • GET /introspect?v=full - List all available tools (full mode)
  • POST /auth - Generate session token from API key
  • POST /tools/:alias - Execute a specific tool

Documentation

  • GET /docs - Interactive API documentation (Swagger UI)
  • GET /docs/json - OpenAPI specification
  • GET /openapi.json - Raw OpenAPI specification
  • GET /openapi.gpt.json - GPT-compatible OpenAPI specification

Available Tools

LinkedIn Profile Tools

  • enrich_people - Extract detailed profile information from LinkedIn URLs

    • Perfect for lead research and candidate evaluation
    • Returns comprehensive profile data including experience, skills, and contact info
  • enrich_company - Get comprehensive company data from LinkedIn company pages

    • Ideal for market research and competitive analysis
    • Provides company details, employee count, industry, and more

Sales Navigator Search Tools

  • search_people - Find prospects using Sales Navigator with advanced filtering

    • Search by location, industry, company size, and other criteria
    • Returns filtered lists of potential leads
  • search_companies - Discover target companies based on various criteria

    • Filter by industry, size, location, and other parameters
    • Perfect for B2B prospecting and market analysis
  • search_company_employees - Identify key contacts within specific organizations

    • Find employees by company URL or LinkedIn company ID
    • Great for account-based marketing and sales outreach

Using with AI Assistants

ChatGPT Integration

  1. Deploy to Vercel:
npm i -g vercel
vercel login
vercel
  1. Configure in ChatGPT:
    • Use the deployed URL: https://your-project.vercel.app
    • The API will be available for ChatGPT Actions

Custom GPT Authentication Workaround

Important: Custom GPTs have limitations with authorization headers. To work around this, the API supports authentication via query parameters:

POST /tools/enrich_people?session_token=your_session_token

Note: Query parameter authentication is provided as a workaround for Custom GPTs. For production applications, prefer header-based authentication for better security.

MCP Protocol Support

The API is fully compatible with the Model Context Protocol (MCP), providing:

  • Standardized tool introspection
  • Consistent error handling
  • Session-based authentication
  • Structured request/response formats

Project Architecture

src/
├── api/                    # API layer
│   ├── handlers/          # Request handlers
│   │   ├── auth.ts        # Authentication endpoint
│   │   ├── health.ts      # Health check
│   │   ├── introspect.ts  # Tool introspection
│   │   └── tools.ts       # Tool execution
│   ├── routes.ts          # Route registration
│   └── schemas/           # Request/response schemas
├── lib/                   # Core business logic
│   ├── alias.ts           # Tool alias mappings
│   ├── auth.ts            # Authentication logic
│   ├── config.ts          # Configuration management
│   ├── error.ts           # Error handling utilities
│   ├── redis.ts           # Redis service
│   ├── schemas/           # Tool parameter schemas
│   ├── translate.ts       # API translation layer
│   └── responseSchemas.ts # Response schemas
├── middleware/            # Request middleware
│   ├── logging.ts         # Request logging
│   └── security.ts        # Security middleware
├── server/                # Server configuration
│   ├── build.ts           # Server builder
│   └── start.ts           # Development server
└── scripts/               # Build scripts
    ├── generate-openapi.ts
    └── generate-openapi-gpt.ts

Testing

The project includes comprehensive testing:

# Run all tests
npm test

# Run specific test suites
npm run test:health
npm run test:auth
npm run test:tools
npm run test:integration

# Run tests with coverage
npm run test:coverage

Test Structure

  • Unit Tests: Individual function and module testing
  • Integration Tests: Full API workflow testing
  • Authentication Tests: Session token and API key validation
  • Error Handling Tests: Comprehensive error scenario coverage

Error Handling

The API provides structured error responses with:

  • Unique request IDs for tracking
  • Standardized error codes for programmatic handling
  • Detailed error messages with context
  • HTTP status codes following REST conventions
  • Sentry integration for error tracking in production

Performance & Scalability

  • Fastify framework for high-performance request handling
  • Redis integration for scalable session management
  • Rate limiting to prevent abuse
  • Request timeouts and retry logic
  • Connection pooling for external API calls
  • Graceful degradation with in-memory fallbacks

Security Features

  • Input validation with JSON schemas
  • Rate limiting to prevent abuse
  • Security headers (CORS, XSS protection, etc.)
  • Authentication validation for all protected endpoints
  • Request logging with sensitive data redaction
  • Error message sanitization to prevent information leakage

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

License

MIT License - see LICENSE file for details.

Releases

No releases published

Packages

No packages published