Skip to content

Conversation

HanHan009527
Copy link
Collaborator

Motivation

/usr/lib/x86_64-linux-gnu/libmlx5.so already in base image, cause build error from ln -s

root@n199-204-219:/tmp/sglang# docker build --network host --build-arg BASE_IMAGE=lmsysorg/sglang:v0.4.7-cu124 -f docker/Dockerfile.deepep -t d --no-cache .
[+] Building 200.4s (21/35)                                                                                                                                                                                                                   docker:default
 => [internal] load build definition from Dockerfile.deepep                                                                                                                                                                                             0.0s
 => => transferring dockerfile: 2.49kB                                                                                                                                                                                                                  0.0s
 => [internal] load metadata for docker.io/lmsysorg/sglang:v0.4.7-cu124                                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                       0.0s
 => => transferring context: 2B                                                                                                                                                                                                                         0.0s
 => CACHED [ 1/32] FROM docker.io/lmsysorg/sglang:v0.4.7-cu124                                                                                                                                                                                          0.0s
 => [ 2/32] RUN apt-get update && apt-get install -y --no-install-recommends build-essential wget libssl-dev && wget https://github.com/Kitware/CMake/releases/download/v3.27.4/cmake-3.27.4-linux-x86_64.sh && chmod +x cmake-3.27.4-linux-x86_64.sh  40.3s
 => [ 3/32] RUN apt-get update     && apt-get install -y --no-install-recommends         python3         python3-pip     && ln -s /usr/bin/python3 /usr/bin/python                                                                                      3.5s
 => [ 4/32] WORKDIR /tmp                                                                                                                                                                                                                                0.0s
 => [ 5/32] RUN git clone https://github.com/NVIDIA/gdrcopy.git                                                                                                                                                                                         1.1s
 => [ 6/32] WORKDIR /tmp/gdrcopy                                                                                                                                                                                                                        0.0s
 => [ 7/32] RUN git checkout v2.4.4                                                                                                                                                                                                                     0.1s
 => [ 8/32] RUN apt update                                                                                                                                                                                                                              2.3s
 => [ 9/32] RUN apt install -y nvidia-dkms-535                                                                                                                                                                                                         65.3s
 => [10/32] RUN apt install -y build-essential devscripts debhelper fakeroot pkg-config dkms                                                                                                                                                           37.9s
 => [11/32] RUN apt install -y check libsubunit0 libsubunit-dev                                                                                                                                                                                         4.2s
 => [12/32] WORKDIR /tmp/gdrcopy/packages                                                                                                                                                                                                               0.0s
 => [13/32] RUN CUDA=/usr/local/cuda ./build-deb-packages.sh                                                                                                                                                                                           36.9s
 => [14/32] RUN dpkg -i gdrdrv-dkms_*.deb                                                                                                                                                                                                               8.2s
 => [15/32] RUN dpkg -i libgdrapi_*.deb                                                                                                                                                                                                                 0.2s
 => [16/32] RUN dpkg -i gdrcopy-tests_*.deb                                                                                                                                                                                                             0.2s
 => [17/32] RUN dpkg -i gdrcopy_*.deb                                                                                                                                                                                                                   0.1s
 => ERROR [18/32] RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so                                                                                                                                                 0.1s
------
 > [18/32] RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so:
0.055 ln: failed to create symbolic link '/usr/lib/x86_64-linux-gnu/libmlx5.so': File exists
------
Dockerfile.deepep:43
--------------------
  41 |
  42 |     # IBGDA dependency
  43 | >>> RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
  44 |     RUN apt-get install -y libfabric-dev
  45 |
--------------------
ERROR: failed to solve: process "/bin/sh -c ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so" did not complete successfully: exit code: 1
root@n199-204-219:/tmp/sglang# docker run -it lmsysorg/sglang:v0.4.7-cu124 bash

root@40ea9ed9b1f8:/sgl-workspace# ls /usr/lib/x86_64-linux-gnu/libmlx5.so*
/usr/lib/x86_64-linux-gnu/libmlx5.so  /usr/lib/x86_64-linux-gnu/libmlx5.so.1  /usr/lib/x86_64-linux-gnu/libmlx5.so.1.24.50.0
root@n199-204-219:/tmp/sglang# docker run -it lmsysorg/sglang:v0.4.6.post5-cu124 bash

root@1ef3b1265109:/sgl-workspace# ls /usr/lib/x86_64-linux-gnu/libmlx5.so*
/usr/lib/x86_64-linux-gnu/libmlx5.so.1  /usr/lib/x86_64-linux-gnu/libmlx5.so.1.22.39.0

Modifications

let it pass

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @HanHan009527, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a build error encountered in the Dockerfile.deepep when the base image already contains the /usr/lib/x86_64-linux-gnu/libmlx5.so symbolic link. The change makes the link creation step non-fatal, ensuring the Docker build completes successfully regardless of the base image's contents regarding this specific file.

Highlights

  • Fix Docker build failure: Modified the ln -s command in docker/Dockerfile.deepep to allow it to succeed even if the target symbolic link /usr/lib/x86_64-linux-gnu/libmlx5.so already exists. This prevents the build from failing when using base images where this link is pre-created.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a Docker build failure caused by an attempt to create a symbolic link (libmlx5.so) that already exists in the base image. The proposed solution is to append || true to the ln -s command, which suppresses any errors from this command and allows the build to continue.

While this fixes the immediate build error, I've suggested a more robust alternative using ln -sf (force). This approach ensures that the symbolic link not only exists but also points to the correct target (libmlx5.so.1), regardless of its previous state. This change would make the Dockerfile more idempotent and resilient to variations in the base image or unexpected issues with the link creation, preventing potential runtime problems that might be masked by simply suppressing the error.

@zhyncs zhyncs merged commit d7c3e8e into sgl-project:main Jun 10, 2025
@zhyncs
Copy link
Member

zhyncs commented Jun 10, 2025

Thanks!

@HanHan009527 HanHan009527 deleted the fix_deepepdockerfile branch June 11, 2025 03:25
almaslof pushed a commit to mpashkovskii/sglang that referenced this pull request Jun 11, 2025
jianan-gu pushed a commit to jianan-gu/sglang that referenced this pull request Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants