-
-
Notifications
You must be signed in to change notification settings - Fork 139
2.2.17 Backend Modular MAX
Handle:
modularmax
URL: http://localhost:34471
MAX is a platform from Modular (creators of Mojo) for running LLMs.
[!WARN] Right now, MAX container only works with Nvidia GPUs or CPUs
# [Optional] pre-pull the image
harbor pull modularmax
# Start the service
# ⚠️ This will download configured model
harbor up modularmax
See troubleshooting guide if you encounter any issues.
- Even when model is already downloaded it may take some time to compile it during the startup (~30s for 3B model)
- GPU-enabled inference with MAX typically running BF16 weights (unquantized models) - will require a lot of VRAM (see estimates in the model registry)
- Harbor connects global HuggingFace cache to the container
- Harbor shares your (configured) HuggingFace token with the container to download gated/private models from HuggingFace
-
modularmax
will be connected towebui
when running together
Best way to find a supported model is to look it up on the Modular model registry. Model IDs are usually handles from HuggingFace.
MAX will automatically download configured model to the HF cache when it is first used. While downloading - the start might take a while. Use harbor logs
to monitor the progress.
# -n 1000 is to see the last 1000 lines
# of logs out of the box
harbor logs modularmax -n 1000
Configure the model:
# Set the model
harbor modularmax model <model_id>
# See currently set model
harbor modularmax model
Additionally, you can configure any arguments that are supported by the max-pipelines
CLI.
# See possible arguments
harbor run modularmax --help
# Set the arguments
harbor modularmax args "--enable-structured-output --trust-remote-code"
# See currently set arguments
harbor modularmax args
Tip
Use harbor profiles
to quickly switch between configuration presets
See official Configuration options for reference.
Following options can be set via harbor config
:
# The port on the host where Modular MAX endpoint will be available
MODULARMAX_HOST_PORT 34471
# The image to use for Modular MAX
# You can switch to max-nvidia-base if needed
MODULARMAX_IMAGE docker.modular.com/modular/max-nvidia-full
# Specific Docker tag to target
MODULARMAX_VERSION latest
# The model that'll be passed to "--model-path" argument
# Same as in `harbor modularmax model`
MODULARMAX_MODEL cognitivecomputations/Dolphin3.0-Qwen2.5-3b
# The arguments to pass to the max-pipelines binary
# Same as in `harbor modularmax args`
MODULARMAX_EXTRA_ARGS
See environment configuration guide to set arbitrary environment variables for the service.