Getting Started at ALCF

Contained herein is a collection of material (source code, scripts, examples, etc...) to be used as examples for setting up user environments, running jobs, debugging and tuning applications, and related tasks for successful use of computational resources at the Argonne Leadership Computing Facility (ALCF).

Material here is typically developed for and used during ALCF training workshops, such as the ALCF Computational Performance Workshops typically held in the Spring and the ALCF Simulation, Data, and Learning Workshops in the Fall.

If not sure where to start, then a good place may be perusing ProgrammingModels for the resource of interest to gain experience compiling simple codes and submitting jobs.

Applications: Example Makefiles, build scripts, etc. for some applications
Data Science: Data & Learning examples with pointers to additional Tutorials
Debug Tools: Simple examples using available debugging tools
Examples: Collection of examples demonstrating common tasks (e.g. compilation & setting affinity)
Helper Scripts: Collection of scripts for common tasks
Programming Models: Simple examples using supported programming models

Resources

Aurora

Note

Aurora is the newest system at ALCF. See: Aurora Machine Overview for additional information.

Aurora is a 10,624-node HPE Cray-Ex based system. It has 166 racks with 21,248 CPUs and 63,744 GPUs. Each node consists of 2 Intel Xeon CPU Max Series (codename Sapphire Rapids or SPR) with on-package HBM and 6 Intel Data Center GPU Max Series (codename Ponte Vecchio or PVC). Each Xeon CPU has 52 physical cores supporting 2 hardware threads per core and 64 GB of HBM. Each CPU socket has 512 GB of DDR5 memory. The GPUs are connected all-to-all with Intel Xe Link interfaces. Each node has 8 HPE Slingshot-11 NICs, and the system is connected in a Dragonfly topology. The GPUs may send messages directly to the NIC via PCIe, without the need to copy into CPU memory.

Additional information on using the system is available at: Getting Started on Aurora

Polaris

Polaris is the (2nd) newest ALCF system with 560 nodes, each with one AMD Milan CPU (32 cores) and four NVIDIA A100 GPUs.

Additional information on using the system is available at: Getting Started on Polaris

Sophia

Sophia is an extension of Theta and is comprised of 24 NVIDIA DGX A100 nodes, each with eight NVIDIA A100 Tensor Core GPUs and two AMD Rome CPUs.

Additional hardware information is available at:

Getting Started on Sophia

🪦 Previous Systems

Cooley

Cooley was a data analysis and visualization cluster consisting of 126 compute nodes, each with 12 Intel Haswell cores and an NVIDIA Tesla K80 dual-GPU card. Additional hardware information is available here.

Theta

Theta was a Cray XC40, 11.7 petaflops system based on the second-generation Intel Xeon Phi processor codenamed Knights Landing (KNL). It contained 4,392 compute nodes available to users each with 64 cores, 192 GiB DDR4 & 16 GiB MCDRAM memory, and a 128 GiB SSD. Additional hardware information is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
Applications		Applications
DataScience		DataScience
DebugTools		DebugTools
Examples		Examples
HelperScripts		HelperScripts
ProgrammingModels		ProgrammingModels
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting Started at ALCF

Contents

Resources

Aurora

Polaris

Sophia

🪦 Previous Systems

Cooley

Theta

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

argonne-lcf/GettingStarted

Folders and files

Latest commit

History

Repository files navigation

Getting Started at ALCF

Contents

Resources

Aurora

Polaris

Sophia

🪦 Previous Systems

Cooley

Theta

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages