Skip to content

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented Apr 3, 2023

Summary

🤖 Generated by Copilot at 293ded1

This pull request adds support for using Visual Studio Code Remote - Containers extension with the pytorch project. It adds a .devcontainer folder with a devcontainer.json file, a Dockerfile, and a noop.txt file that configure and create a dev container with Anaconda and Python 3.

🤖 Generated by Copilot at d6b9cd7

devcontainer.json
Configures PyTorch containers
For CPU or GPU

Related to:

#92838

cc @ZainRizvi @kit1980 @huydhn @clee2000

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98252

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 71a2c32:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@drisspg drisspg force-pushed the devcontainers branch 2 times, most recently from 720f5cc to 9f65121 Compare May 9, 2023 22:23
@drisspg
Copy link
Contributor Author

drisspg commented May 11, 2023

Currently trying to figure out how opinionated I should be in the container. I prefer clang and lldb, hence some scripts to do that. Ideally there would be some user defined mix-ins that can be added to customize to the users preferences, still trying to figure that. @ZainRizvi I would be curious for the devx take on this specifically around choice of base docker image, perhaps I should be using an existing CI reference?

I have been running this on my local devcontainer and can build Pytorch with these settings:

export USE_MPS=0
export USE_KINETO=0
export BUILD_TEST=0
export USE_DISTRIBUTED=0
export BUILD_CAFFE2=0
export BUILD_CAFFE2_OPS=0
export USE_FBGEMM=0
export USE_OPENCV=0
export USE_QNNPACK=0
export USE_XNNPACK=0
export DEBUG=1
export USE_CUDA=0

Also since the cluster has Docker installed, this does work pretty seamlessly there as well and can build with same settings. I am going to create a cuda build

Copy link
Contributor

@ZainRizvi ZainRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, it's a great PR to see!

I see you're following the instructions from README.md to setup this image. I do like the idea of using our existing CI base images instead for an even more consistent debugging experience though. @huydhn, would you be able to point him to a good image to use?

cc: @malfet

@ZainRizvi ZainRizvi requested review from huydhn, malfet and a team May 15, 2023 19:33
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this.

I think base image shouldn't matter(as long as it reasonably recent), nor choice of compiler (though I guess VSCode is better at parsing clang errors than CPP ones

Please move update_laternatives to .devicontainer/scripts folder rather than keep it in top level directory.

Also, I'm curious, why not have single Dockerfile, with an extra step for CUDA (which is triggered via arg in devcontainer.json)

Comment on lines 9 to 10
local master=${3}
local slaves=${4}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so on..

Suggested change
local master=${3}
local slaves=${4}
local leader=${3}
local follower=${4}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I don't agree with the terminology, it is very present in the man pages for update-alternatives: https://man7.org/linux/man-pages/man1/update-alternatives.1.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But please move this script to .devcontainer folder. Or perhaps delete it altogether, if you don't plan to configure clang as default compiler.

Comment on lines 17 to 27
# # If USE_CLANG is set we will install
# Can't build succesfully, The exports need to go in the .sh script
# But also clang-15 might be two new for cuda 11.8

# ARG USE_CLANG
# RUN if [ -n "$USE_CLANG" ]; then \
# sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"; \
# sudo .devcontainer/scripts/update_alternatives_clang.sh 15 10; \
# export CC=clang; \
# export CXX=clang++; \
# fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not merge commented out code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this workflow I can build with clang succesfully on CPU but when trying to compile with CUDA:


  #$ "/usr/bin"/clang -D__CUDA_ARCH__=520 -D__CUDA_ARCH_LIST__=520 -E -x c++
  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__
  "-I/opt/conda/bin/../include" -D__CUDACC_VER_MAJOR__=11
  -D__CUDACC_VER_MINOR__=8 -D__CUDACC_VER_BUILD__=89
  -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=8
  -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64
  "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp1.ii"

  /usr/bin/clang: /opt/conda/bin/../lib/libtinfo.so.6: no version information
  available (required by /usr/lib/x86_64-linux-gnu/libLLVM-11.so.1)

  /usr/bin/clang: /opt/conda/bin/../lib/libtinfo.so.6: no version information
  available (required by /usr/lib/x86_64-linux-gnu/libedit.so.2)

  /usr/bin/clang: symbol lookup error:
  /usr/lib/x86_64-linux-gnu/libLLVM-11.so.1: undefined symbol:
  ffi_type_sint32, version LIBFFI_BASE_7.0

  # --error 0x7f --

  

  

Call Stack (most recent call first):
  /opt/conda/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
  /opt/conda/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:48 (__determine_compiler_id_test)
  /opt/conda/share/cmake-3.22/Modules/CMakeDetermineCUDACompiler.cmake:298 (CMAKE_DETERMINE_COMPILER_ID)
  cmake/public/cuda.cmake:47 (enable_language)
  cmake/Dependencies.cmake:43 (include)
  CMakeLists.txt:710 (include)

I think this has to do with the mismatch between conda install cuda libs. I can remove this Clang stuff entirely but not sure

Comment on lines 36 to 38
# [Optional] Uncomment this section to install additional OS packages.
# RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
# && apt-get -y install --no-install-recommends <your-package-list-here>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about turning this into an ARG instead, like the other sections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea will add

@drisspg
Copy link
Contributor Author

drisspg commented May 30, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 30, 2023
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@drisspg drisspg added topic: new features topic category module: devx Related to PyTorch contribution experience (HUD, pytorchbot) labels May 30, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented May 30, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'mege' (choose from 'merge', 'revert', 'rebase', 'label', 'drci')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

Try @pytorchbot --help for more info.

@drisspg
Copy link
Contributor Author

drisspg commented May 30, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@drisspg
Copy link
Contributor Author

drisspg commented May 30, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase and merge by leaving the following comment on this PR:
@pytorchbot merge -r
Or just rebase by leaving @pytorchbot rebase comment

Details for Dev Infra team Raised by workflow job

@drisspg
Copy link
Contributor Author

drisspg commented May 30, 2023

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased devcontainers onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout devcontainers && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@GZ82
Copy link

GZ82 commented Aug 4, 2023

Thanks very much for add this! is it possible to set up one for MacOS plus AMD GPU?

@drisspg
Copy link
Contributor Author

drisspg commented Aug 7, 2023

@GZ82
Definitely, would you like to contribute this? I will create an issue tracking this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: devx Related to PyTorch contribution experience (HUD, pytorchbot) release notes: devx topic: new features topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants