This project is a lightweight REST API server built using FastAPI that receives binary data from a file, converts it to Markdown format using the MarkItDown library, and returns the Markdown content.
Important
This project started as a fork of elbruno/MarkItDownServer.
Note
This project uses uv
for dependency management and multistage Docker builds, significantly reducing build times and final image size.
-
Clone the repository:
git clone <repository-url>
-
Navigate to the project directory:
cd <project-dir>
-
Build the docker image
docker build -t markitdown-api:latest .
-
Run the docker container
docker run -d --name markitdown-api -p 8490:8490 markitdown-api:latest
For easier development, a convenience script is included to rebuild the image and restart the container:
-
Make the script executable:
chmod +x rebuild.sh
-
Run the script whenever you make changes:
./rebuild.sh
The script will:
- Stop the running container
- Remove the container
- Build a fresh image
- Start a new container
- Verify the container is running
This simplifies the development process when you're making frequent changes to the codebase.
The API offers two main endpoints:
Provides an interactive documentation interface where you can:
- Read and explore the existing API endpoints
- View request/response schemas and examples
Accepts a POST request containing a file to convert to markdown.
- Method: POST
- Content-Type: multipart/form-data
- Parameter: file (binary)
- Accepted file types: doc, docx, ppt, pptx, pdf, xls, xlsx, txt, csv, json
- Returns: JSON object with the converted markdown content
For more information regarding valid file types, check the official MarkItDown project.
You can quickly test that the application is running by uploading a file via curl
, like so:
curl -X POST -F "file=@path/to/mypdf.pdf" http://localhost:8490/process_file
The result should be a string encoding a JSON object like:
{ "markdown": "Your content written in markdown..." }
Here's a very simple example in Python:
import requests
file_path = "/path/to/my.pdf"
with open(file_path, 'rb') as file:
# Prepare the file for the multipart/form-data request
files = {'file': (file_path, file)}
# Make the POST request to the API
response = requests.post("http://localhost:8490/process_file", files=files)
# Parse the JSON response
result = response.json()
# Return the markdown content
content = result.get('markdown')
This project was originally based on elbruno/MarkItDownServer by Bruno Capuano.
This project is licensed under the MIT License.