Skip to content

2.2.17 Backend Modular MAX

av edited this page Apr 26, 2025 · 1 revision

Handle: modularmax
URL: http://localhost:34471

MAX is a platform from Modular (creators of Mojo) for running LLMs.

Starting

[!WARN] Right now, MAX container only works with Nvidia GPUs or CPUs

# [Optional] pre-pull the image
harbor pull modularmax

# Start the service
# ⚠️ This will download configured model
harbor up modularmax

See troubleshooting guide if you encounter any issues.

  • Even when model is already downloaded it may take some time to compile it during the startup (~30s for 3B model)
  • GPU-enabled inference with MAX typically running BF16 weights (unquantized models) - will require a lot of VRAM (see estimates in the model registry)
  • Harbor connects global HuggingFace cache to the container
  • Harbor shares your (configured) HuggingFace token with the container to download gated/private models from HuggingFace
  • modularmax will be connected to webui when running together

Models

Best way to find a supported model is to look it up on the Modular model registry. Model IDs are usually handles from HuggingFace.

MAX will automatically download configured model to the HF cache when it is first used. While downloading - the start might take a while. Use harbor logs to monitor the progress.

# -n 1000 is to see the last 1000 lines
# of logs out of the box
harbor logs modularmax -n 1000

Configure the model:

# Set the model
harbor modularmax model <model_id>

# See currently set model
harbor modularmax model

Additionally, you can configure any arguments that are supported by the max-pipelines CLI.

# See possible arguments
harbor run modularmax --help

# Set the arguments
harbor modularmax args "--enable-structured-output --trust-remote-code"

# See currently set arguments
harbor modularmax args

Tip

Use harbor profiles to quickly switch between configuration presets

Configuration

See official Configuration options for reference.

Following options can be set via harbor config:

# The port on the host where Modular MAX endpoint will be available
MODULARMAX_HOST_PORT           34471

# The image to use for Modular MAX
# You can switch to max-nvidia-base if needed
MODULARMAX_IMAGE               docker.modular.com/modular/max-nvidia-full

# Specific Docker tag to target
MODULARMAX_VERSION             latest

# The model that'll be passed to "--model-path" argument
# Same as in `harbor modularmax model`
MODULARMAX_MODEL               cognitivecomputations/Dolphin3.0-Qwen2.5-3b

# The arguments to pass to the max-pipelines binary
# Same as in `harbor modularmax args`
MODULARMAX_EXTRA_ARGS

See environment configuration guide to set arbitrary environment variables for the service.

Clone this wiki locally