MDT YAML Configuration Format¶

MDT uses a structured YAML format to define verification workflows. Each section of the configuration defines a set of tasks that MDT will execute in order of their dependencies.

Top-Level Sections¶

Section	Description
`data`	Defines model and observation datasets to load.
`pairing`	Specifies how datasets are matched in time and space.
`combine`	Merges multiple paired datasets for comparison.
`statistics`	Lists the verification metrics to compute.
`plots`	Configures visualizations.
`execution`	Sets up the compute environment (e.g., local or HPC).

1. `data` Section¶

The data section defines the sources of information for your verification.

type: The monetio dataset name (e.g., cmaq, wrfchem, aeronet, airnow).
kwargs: A dictionary of arguments passed to the monetio reader's open function.

data:
  cmaq_output:
    type: "cmaq"
    kwargs:
      fname: "/path/to/cmaq_file.nc"

2. `pairing` Section¶

Pairing tasks align two datasets together.

source: The name of the source dataset (usually a model).
target: The name of the target dataset (usually observations).
method: The pairing algorithm (interpolate or regrid).
kwargs: Additional arguments for the pairing function.

For model-to-model comparison using regrid, you can use merge_target: true to include the target model in the output.

pairing:
  # Model vs Obs (Interpolation)
  cmaq_airnow:
    source: "cmaq_output"
    target: "airnow_obs"
    method: "interpolate"

  # Model vs Model (Regridding)
  cmaq_vs_wrf:
    source: "cmaq_output"
    target: "wrfchem_model"  # Target model defines the grid
    method: "regrid"
    kwargs:
      merge_target: true
      suffix_source: "_cmaq"
      suffix_target: "_wrf"

3. `combine` Section¶

The combine section (optional) merges multiple paired datasets into a single dataset. This is useful for comparing multiple model runs against the same observations.

sources: A list of pairing task names to combine.
dim: The name of the new dimension to create (defaults to model).

combine:
  all_runs:
    sources:
      - pair_run1
      - pair_run2
    dim: model

4. `statistics` Section¶

Compute verification metrics on paired data.

input: The name of a pairing or data task.
metrics: A list of metric names to compute (e.g., rmse, bias, corr).
kwargs: Arguments passed to the metric functions (e.g., obs_var, mod_var).

statistics:
  airnow_stats:
    input: "cmaq_airnow"
    metrics: ["rmse", "bias", "corr"]
    kwargs:
      obs_var: "OZONE"
      mod_var: "OZONE"

5. `plots` Section¶

Generate visualizations.

input: The name of a pairing, statistics, or data task.
type: The type of plot (e.g., spatial, scatter, timeseries).
kwargs: Arguments passed to the plotting function.

plots:
  spatial_cmaq:
    input: "cmaq_airnow"
    type: "spatial"
    kwargs:
      savename: "cmaq_map.png"

6. `execution` Section¶

Configure how and where tasks are executed.

default_cluster: The name of the cluster to use for tasks (defaults to compute).
clusters: A dictionary of cluster definitions.

execution:
  default_cluster: "local_cpu"
  clusters:
    local_cpu:
      mode: "local"
      n_workers: 4

Advanced Example: Multiple Models vs. Observations¶

This example shows how to compare two different models against a single set of observations.

data:
  # Load two different model outputs
  cmaq_model:
    type: "cmaq"
    kwargs:
      fname: "cmaq_data.nc"
  wrfchem_model:
    type: "wrfchem"
    kwargs:
      fname: "wrfchem_data.nc"

  # Load observational data
  airnow_obs:
    type: "airnow"
    kwargs:
      fname: "airnow_data.nc"

pairing:
  # Pair each model to the same observations
  pair_cmaq:
    source: "cmaq_model"
    target: "airnow_obs"
    method: "interpolate"
  pair_wrfchem:
    source: "wrfchem_model"
    target: "airnow_obs"
    method: "interpolate"

statistics:
  # Compute stats for each model
  cmaq_stats:
    input: "pair_cmaq"
    metrics: ["rmse", "bias"]
    kwargs:
      obs_var: "OZONE"
      mod_var: "OZONE"
  wrfchem_stats:
    input: "pair_wrfchem"
    metrics: ["rmse", "bias"]
    kwargs:
      obs_var: "OZONE"
      mod_var: "OZONE"

plots:
  # Create comparative spatial plots
  plot_cmaq:
    input: "pair_cmaq"
    type: "spatial"
    kwargs:
      savename: "cmaq_spatial.png"
  plot_wrfchem:
    input: "pair_wrfchem"
    type: "spatial"
    kwargs:
      savename: "wrfchem_spatial.png"

MDT YAML Configuration Format¶

Top-Level Sections¶

1. data Section¶

2. pairing Section¶

3. combine Section¶

4. statistics Section¶

5. plots Section¶

6. execution Section¶