Skip to content

dezoito/markitdown-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MarkItDown API Server

This project is a lightweight REST API server built using FastAPI that receives binary data from a file, converts it to Markdown format using the MarkItDown library, and returns the Markdown content.

Workflow chart

Important

This project started as a fork of elbruno/MarkItDownServer.

Note

This project uses uv for dependency management and multistage Docker builds, significantly reducing build times and final image size.

Setup Instructions

  1. Clone the repository:

    git clone <repository-url>
  2. Navigate to the project directory:

    cd <project-dir>
  3. Build the docker image

    docker build -t markitdown-api:latest .
  4. Run the docker container

    docker run -d --name markitdown-api -p 8490:8490 markitdown-api:latest

Development Workflow

For easier development, a convenience script is included to rebuild the image and restart the container:

  1. Make the script executable:

    chmod +x rebuild.sh
  2. Run the script whenever you make changes:

    ./rebuild.sh

The script will:

  • Stop the running container
  • Remove the container
  • Build a fresh image
  • Start a new container
  • Verify the container is running

This simplifies the development process when you're making frequent changes to the codebase.

Endpoints

The API offers two main endpoints:

/docs

Provides an interactive documentation interface where you can:

  • Read and explore the existing API endpoints
  • View request/response schemas and examples

/process_file

Accepts a POST request containing a file to convert to markdown.

  • Method: POST
  • Content-Type: multipart/form-data
  • Parameter: file (binary)
  • Accepted file types: doc, docx, ppt, pptx, pdf, xls, xlsx, txt, csv, json
  • Returns: JSON object with the converted markdown content

For more information regarding valid file types, check the official MarkItDown project.

Testing the application

You can quickly test that the application is running by uploading a file via curl, like so:

curl -X POST -F "file=@path/to/mypdf.pdf" http://localhost:8490/process_file

The result should be a string encoding a JSON object like:

{ "markdown": "Your content written in markdown..." }

Here's a very simple example in Python:

import requests

file_path = "/path/to/my.pdf"
with open(file_path, 'rb') as file:
    # Prepare the file for the multipart/form-data request
    files = {'file': (file_path, file)}
    
    # Make the POST request to the API
    response = requests.post("http://localhost:8490/process_file", files=files)
    
    # Parse the JSON response
    result = response.json()
    
    # Return the markdown content
    content = result.get('markdown')

Acknowledgments

This project was originally based on elbruno/MarkItDownServer by Bruno Capuano.

License

This project is licensed under the MIT License.

About

Ultra lightweight API server to convert files (.pdf, .docx, .xlsx) into formatted markdown.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published