Skip to content

HPC Usage Guide

MDT is designed to scale from local workstations to high-performance computing (HPC) clusters. It integrates with dask-jobqueue to automatically manage job submission and worker lifecycle on various NOAA platforms.

Execution Configuration

The execution section of your YAML file controls where and how your tasks run.

execution:
  default_cluster: "hera_batch"
  clusters:
    hera_batch:
      mode: "hera"
      account: "my_account"
      walltime: "02:00:00"

Supported NOAA Platforms

MDT provides built-in profiles for major NOAA RDHPCS and production platforms. Specifying one of these in the mode field will automatically apply optimized defaults for that system.

Platform mode Scheduler Default Cores Default Memory
Hera hera SLURM 40 120GB
Jet jet SLURM 24 60GB
Orion orion SLURM 40 180GB
Hercules hercules SLURM 80 250GB
Gaea gaea SLURM 36 120GB
Ursa ursa SLURM 36 120GB
WCOSS2 wcoss2 PBS 128 256GB

Generic Schedulers

If you are on a platform not listed above, you can use generic scheduler modes: * slurm * pbs * lsf

Customizing Cluster Parameters

You can override any platform default by providing additional keys in the cluster configuration. These are passed directly to the underlying dask-jobqueue cluster class (e.g., SLURMCluster or PBSCluster).

execution:
  clusters:
    custom_jet:
      mode: "jet"
      cores: 12
      memory: "30GB"
      job_extra_directives: ["--qos=windfall"]

Assigning Tasks to Specific Clusters

MDT allows you to run different tasks on different clusters within the same workflow. For example, you might want to load data on a local cluster but perform heavy pairing and statistics on an HPC cluster.

data:
  obs_data:
    type: "airnow"
    cluster: "local"  # Run locally
    kwargs:
      fname: "obs.nc"

pairing:
  heavy_pairing:
    source: "model_data"
    target: "obs_data"
    cluster: "hera_batch"  # Run on HPC
    method: "regrid"

If no cluster is specified for a task, it uses the default_cluster defined in the execution section.