Skip to main content

How should PIs request time on exascale systems Frontier (OLCF) and Aurora (ALCF)?

We recognize that many PIs will not have had the opportunity to test their code on exascale or pre-exascale hardware before submitting their INCITE proposal. There will be exceptions among PIs who have been involved with Early Science Projects (ESP) or Exascale Computing Projects (ECP). To accommodate all PIs, we offer the guidance below.

How should PIs estimate the required compute time on Frontier?

Frontier is now available. The best path to establishing compute time requirements on Frontier is by running on Frontier, by requesting a Director’s Discretionary allocation (link). If this is not possible then the next best alternative is to utilize scaling data from a large GPU machine combined with limited performance data from an AMD GPU (ideally MI250X). If this is not possible, then scaling data from a large GPU machine (e.g. Summit) with projections to the MI250X based on the GPU specifications may be used. If performance data from a MI250X is not used, the authors should include a discussion on the extrapolation used to estimate time on Frontier. All hours requested must be in Frontier-native node hours.

How should PIs estimate the required compute time on Polaris?

Polaris is available today. The best path to establishing compute time requirements on Polaris is by running on Polaris, most likely by requesting a Director’s Discretionary allocation (link). Alternatively, scaling results from a comparable GPU-based system could be used to estimate Polaris compute hours.

How should PIs estimate the required compute time on Aurora?

Estimates for Aurora should be made on the basis of comparable GPU-based machines, in Aurora node hours; reasonable estimates could be derived from systems that include multiple GPUs per node. State of the art GPUs such as NVIDIA A100 or AMD MI200 provide a good foundation from which to estimate performance and therefore compute-time on the Intel PVC GPUs that will be deployed for Aurora. Polaris, with four A100 GPUs per node is, therefore, a good basis for estimating performance on Aurora. When estimating Aurora time, we suggest using a 2:1 factor for converting from Polaris node hours to Aurora node hours (i.e. if a task would require 2M Polaris node hours, one could estimate that it would require 1M Aurora node hours)

PIs from ESP or ECP projects who have been running their application on pre-Aurora hardware should indicate that in their proposal, and specify the programming model used to leverage the GPUs; performance details on these systems should, however, not be included in the INCITE proposal due to their potentially sensitive nature, but instead please coordinate with your collaborator within ALCF.

Do I need to use the GPUs to apply for Aurora time?

Yes.