HPC Usage Guide¶
MDT is designed to scale from local workstations to high-performance computing (HPC) clusters. It integrates with dask-jobqueue to automatically manage job submission and worker lifecycle on various NOAA platforms.
Execution Configuration¶
The execution section of your YAML file controls where and how your tasks run.
execution:
default_cluster: "hera_batch"
clusters:
hera_batch:
mode: "hera"
account: "my_account"
walltime: "02:00:00"
Supported NOAA Platforms¶
MDT provides built-in profiles for major NOAA RDHPCS and production platforms. Specifying one of these in the mode field will automatically apply optimized defaults for that system.
| Platform | mode |
Scheduler | Default Cores | Default Memory |
|---|---|---|---|---|
| Hera | hera |
SLURM | 40 | 120GB |
| Jet | jet |
SLURM | 24 | 60GB |
| Orion | orion |
SLURM | 40 | 180GB |
| Hercules | hercules |
SLURM | 80 | 250GB |
| Gaea | gaea |
SLURM | 36 | 120GB |
| Ursa | ursa |
SLURM | 36 | 120GB |
| WCOSS2 | wcoss2 |
PBS | 128 | 256GB |
Generic Schedulers¶
If you are on a platform not listed above, you can use generic scheduler modes:
* slurm
* pbs
* lsf
Customizing Cluster Parameters¶
You can override any platform default by providing additional keys in the cluster configuration. These are passed directly to the underlying dask-jobqueue cluster class (e.g., SLURMCluster or PBSCluster).
execution:
clusters:
custom_jet:
mode: "jet"
cores: 12
memory: "30GB"
job_extra_directives: ["--qos=windfall"]
Assigning Tasks to Specific Clusters¶
MDT allows you to run different tasks on different clusters within the same workflow. For example, you might want to load data on a local cluster but perform heavy pairing and statistics on an HPC cluster.
data:
obs_data:
type: "airnow"
cluster: "local" # Run locally
kwargs:
fname: "obs.nc"
pairing:
heavy_pairing:
source: "model_data"
target: "obs_data"
cluster: "hera_batch" # Run on HPC
method: "regrid"
If no cluster is specified for a task, it uses the default_cluster defined in the execution section.