Summit

The 200-petaflop system called Summit, is available to users through INCITE 2022 program. The Summit system has very powerful, large memory nodes that are most effective for those applications that effectively use the GPUs. The machine also has node-local, non-volatile memory that can be used to increase I/O bandwidth or provide expanded local storage for applications. The peak node performance is 42 teraflops per node with 512 gigabytes of DDR4 memory, 96 gigabytes of HBM2 memory, and 1,600 gigabytes of non-volatile memory.

Summit System Configuration
Architecture: IBM
Node: 2 IBM Power9 processors + 6 NVIDIA Volta GPUs
Compute Nodes: 4,608 hybrid nodes
Node performance: 42 TF
Memory/node: 512 GB DDR4 + 96 GB High Bandwidth Memory (HBM2)
Available NV memory per node: 1,600 GB
GPU link: NVLink 2.0
Total system memory: >10 PB (DDR4 + HBM2 + Non-volatile)
Interconnect: Non-blocking Fat Tree
Interconnect Bandwidth

(Node injection bandwidth):

Dual Rail EDR-IB (25 GB/s)
File system: 250 PB, 2.5 TB/s, GPFS
Peak Speed: 200 PF

Summit uses IBM’s Spectrum Scale™ file system.  It will have 250 PB of capacity and bandwidth of up to 2.5 TB/second.  In addition, each node has a 1.6 TB of non-volatile memory that provides high-speed local storage as well as serving as a burst buffer in front of the file system.  All OLCF users will have access to the HPSS data archive, the Rhea and Eos pre- and post-processing clusters, and the EVEREST high-resolution visualization facility. All of these resources are available through high-performance networks including ESnet’s recently upgraded 100 gigabit per second links.

For more information about any of the OLCF resources, please visit https://www.olcf.ornl.gov/olcf-resources/

Summit Website

Frontier (Available in 2023)

Two and three-year proposals in 2022 may request access to the Frontier system in 2023. The Frontier system will be based on Cray’s new Shasta architecture and Slingshot interconnect with high-performance AMD EPYC CPU and Radeon Instinct GPU technology. The new accelerator-centric compute blades will support a 4:1 GPU-to-CPU ratio connected with high-speed links and coherent memory between them within the node. The peak performance is expected to be over 1.5 exaflops (EF).

For the 2022 INCITE proposal submission, Frontier allocations in 2023 will be requested in equivalent “Summit node-hours.” For planning purposes for this cycle, we conservatively expect nearly 133 million Summit-equivalent node-hours to be allocated on Frontier per year and that the average INCITE project will be awarded approximately 3-4 million Summit-equivalent node-hours in 2023. Awarded three-year projects will be required to revisit their Summit-equivalent node-hour request for Frontier and the computational readiness for Frontier will be re-evaluated in future renewals.

Further details may be found https://www.olcf.ornl.gov/frontier/

Theta

Theta, the Argonne Leadership Computing Facility’s Cray XC40 supercomputer, is equipped with 281,088 cores, 70 terabytes of high-bandwidth MCDRAM, 843 terabytes of DDR4 memory, 562 terabytes on SSDs, and has a peak performance of 11.69 petaflops.

Theta’s 4,392 compute nodes have an Intel “Knights Landing” Xeon Phi processor containing 64 cores, each with four hardware threads, and 16 gigabytes of high-bandwidth MCDRAM, 192 gigabytes of DDR4 memory, and a 128 gigabyte SSD. Theta uses Cray’s high performance Aries network in a Dragonfly configuration. The Xeon Phi supports improved vectorization using AVX-512 SIMD instructions.

Theta System Configuration
Architecture: Cray XC40
Processor: Intel “Knights Landing” Xeon Phi
Nodes: 4,392 compute nodes
Cores/node: 64
Total cores: 281,088 Cores
HW threads/core: 4
HW threads/node: 256
Memory/node: 16 GiB MCDRAM + 192 GiB DDR4 + 128 GiB SSD
Memory/core: 250 MiB MCDRAM + 3 GiB DDR4
Interconnect: Aries (Dragonfly)
Speed: 11.69 PF

Users of Theta will have access to Eagle and Grand, ALCF’s newly deployed file storage systems with 650 GB/s bandwidth. These 100-PB Luster filesystems, supported by two ClusterStor E1000 storage systems, represent a substantial upgrade over their predecessors. The Grand filesystem will be primarily used for output generated from computational runs on ALCF computing systems and the Eagle filesystem will be primarily used for data sharing across high-performance computing facilities.

For more information about any of the ALCF resources, please visit http://www.alcf.anl.gov/computing-resources.

Theta Website

Polaris

ALCF will be deploying the Polaris system in the summer 2021 time frame. Polaris is planned to be a hybrid CPU/GPU machine that will be available to INCITE.

Anticipated Polaris Configuration
System Peak 35-45 PF
Peak Power <2 MW
Total System Memory >250 TB
System Memory Type DDR, HBM
Node Performance >70 TF
Node Processors 1 CPU; 4 GPUs
System Size >500 nodes
Node-to-Node Interconnect 200 Gb
Programming Models OpenMP 4.5/5, SYCL, Kokkos, Raja, HiP
Performance/Debugging GPU tools, PAPI, TAU, HPCToolkit, DDT
Frameworks Python/Numba, TensorFlow, PyTorch

Aurora (Available in 2023)

With delivery in 2022, the replacement for Theta is Aurora. The public information for Aurora is listed below.

System Spec Aurora System Spec
System Performance ≥1EF DP sustained Node Performance > 130 TF
Compute Node 2 Intel Xeon scalable processors (Sapphire Rapids); 6 Intel Xe arch based GPU (Ponte Vecchio) Number of Nodes > 9,000
GPU Architecture Xe arch based GPU (Ponte Vecchio); Tile based, chiplets, HBM stack, Foveros 3d integration Number of Cabinets > 100
CPU-GPU Interconnect PCIe Node Memory Architecture Unified memory architecture, RAMBO
Aggregate System Memory >10 PB Peak Power ≤ 60 MW
System Interconnect HPE Slingshot 11; Dragonfly topology with adaptive routing Programming languages and models Fortran, C, C++, OpenMP 5.x (Intel, Cray, and possibly LLVM compilers), UPC (Cray), Coarray Fortran (Intel), Data Parallel C++ (Intel and LLVM compilers), Open SHMEM, Python, Numba, MPI, OpenCL
Network Switch 25.6 TB/s per switch, from 64-200 GB ports (25GB/s per direction) Compilers Intel, LLVM, GCC
High-Performance Storage ≥230 PB, ≥25 TB/s (DAOS) Programming Tools Open|Speedshop, TAU, HPCToolkit, Score-P, Darshan, Intel Trace Analyser and Collector, Intel Vtune, Advisor, and Inspector, PAPI, GNU gprof
Programming Models Intel oneAPI, OpenMP, DPC++/SYCL Debugging and Correctness Tools Stack Trace Analysis Tool, gdb, Cray Abnormal Termination Processing
Software Stack HPE Cray XE software stack + Intel enhancements + Data and Learning Math Libraries Intel MKL, Intel MKL-DNN, ScaLAPACK
Platform HPE Cray XE GUI and Viz APIs, I/O Libraries X11, Motif, QT, NetCDF, Parallel, NetCDF, HDF5
Frameworks TensorFlow, PyTorch, Scikit-learn, Spark Mllib, GraphX, Intel DAAL, Intel MKL-DNN

The expected software stack includes the Cray Shasta software stacks, Intel software, and data and learning frameworks. Supported programming models include MPI, Intel OneAPI, OpenMP, SYCL/DPC++., Kokkos, Raja, and others.

For the 2022 INCITE proposal submission, Aurora allocations in 2023 will be requested in equivalent “Theta node-hours.” For planning purposes for this cycle, we conservatively expect nearly 1.78 billion Theta-equivalent node-hours to be allocated on Aurora per year and that the average INCITE project will be awarded approximately 40-50 million Theta-equivalent node-hours in 2023. Awarded three-year projects will be required to revisit their Theta-equivalent node-hour request for Aurora and the computational readiness for Aurora will be re-evaluated in future renewals.

The most recent public information on Aurora can be found at https://aurora.alcf.anl.gov.