Benchmarking Examples and Estimated Costs on the Roman Research Nexus

This page provides representative benchmarking examples for common tasks performed on the Roman Research Nexus, a cloud-based environment for Roman science. It summarizes typical CPU usage measured for well-defined cases and is intended to support planning and resource estimation for Nexus users. The benchmark cases and results shown here are illustrative and should be interpreted alongside the detailed case definitions that follow. 

Roman Research Nexus logo




Benchmark Case Definition

This section defines the benchmark cases used to estimate typical CPU usage for common operations on the Roman Research Nexus. Each case describes the processing configuration, data volume, and key assumptions (e.g., inclusion of source detection or file I/O).

RomanCal Exposure-Level Pipeline: L1 → L2

  • Case 1:  Single-detector calibration, full processing 

    • Executes the full RomanCal Exposure-Level Pipeline (L1 → L2) on one WFI detector
    • Includes source detection and source catalog generation
    • Executed entirely in memory (no file I/O)
    • Scene-dependent test executed with ~1500 sources (combined stars and galaxies, uniformly distributed)
  • Case 2: Single-detector calibration, no catalog products

    • Executes RomanCal Exposure Level Pipeline (L1 → L2) on one WFI detector

    • Source detection and source catalog generation are excluded

    • Executed entirely in memory (no file I/O)

    • Independent of source density since the detection and catalog steps are excluded

RomanCal Mosaic Level Pipeline: L2 → L3

  • Case 1: Single-detector 4-point gap-filling dither, full processing
  • Case 2: Single-detector 4-point gap-filling dither, no catalog products
  • Case 3: Full WFI field-of-view mosaic, no source products
    • Executes the RomanCal Mosaic-Level Pipeline (L2 → L3) on eighteen WFI detectors observed in a single exposure
    • Combines all 18 exposure-level images into a single mosaic
    • Source detection and source catalog generation are excluded 
    • Includes file I/O
    • Independent of source density since the detection and catalog steps are excluded
    • Due to memory footprint, requires RAM ≥ 64GB (medium server or larger)

Roman I-Sim Simulations

All cases include PSF generation with STPSF

  • Case 1: Simulation of uncalibrated data (L1), galaxies
    • Uses Roman I-Sim to simulate WFI uncalibrated imaging data (L1) for one detector

    • Input catalog contains ~1000 galaxies, uniformly distributed
  • Case 2: Simulation of exposure-level calibrated product (L2), galaxies
    • Uses Roman I-Sim to simulate WFI exposure-level calibrated product (L2) for one detector

    • Input catalog contains ~1000 galaxies, uniformly distributed
  • Case 3: Simulation of uncalibrated data (L1), stars
    • Uses Roman I-Sim to simulate WFI uncalibrated imaging data (L1) for one detector

    • Input catalog contains ~1000 stars, uniformly distributed
  • Case 4: Simulation of exposure-level calibrated product (L2), stars
    • Uses Roman I-Sim to simulate WFI exposure-level calibrated product (L2) for one detector

    • Input catalog contains ~1000 stars, uniformly distributed
  • Case 5: Simulation of uncalibrated data (L1), mixed scene
    • Uses Roman I-Sim to simulate WFI uncalibrated imaging data (L1) for one detector

    • Input catalog contains ~500 galaxies and ~500 stars, uniformly distributed
  • Case 6: Simulation of exposure-level calibrated product (L2), mixed scene
    • Uses Roman I-Sim to simulate WFI exposure-level calibrated product (L2) for one detector

    • Input catalog contains ~500 galaxies and ~500 stars, uniformly distributed

Aperture Photometry

  • Case 1: Large stellar catalog
    • Performs source detection and aperture photometry on ~10,000 stars

    • Uses photutils.DAOStarFinder

    • Approximately 92% of injected sources recovered

  • Case 2: Moderate stellar catalog
    • Performs source detection and aperture photometry on ~1,000 stars

    • Uses photutils.DAOStarFinder

    • All injected sources recovered, with some spurious detections

Galaxy Shape Measurements

  • Case 1: Simple moment-based shape measurements
    • Light-weight per-object shape estimator computing basic galaxy-shape quantities:
      • position angle
      • major-to-minor axis ratio
  • Case 2: Sérsic profile fitting
    • Fits a Sérsic surface-brightness model to each individual galaxy
    • Computes full covariance matrix of fitted parameters

Astrocut

  • Case 1: Cutout generation (100×100 pixels) 
    • Uses AstroCut to generate 100 cutouts of size 100×100 pixels

  • Case 2: Cutout generation, 10×10 pixels
    • Uses AstroCut to generate 100 cutouts of size 10×10 pixels

Exposure Time Calculations (Pandeia)

  • Case 1: Signal-to-noise ratio estimates
    • Uses Pandeia to compute SNR for a source at a given magnitude and for a given exposure configuration 

    • Benchmark based on 100 estimates

  • Case 2: Limiting magnitude estimates
    • Uses Pandeia to compute the 5σ point-source limiting magnitude for a given exposure configuration 

    • Benchmark based on 100 estimates

  • Case 3: Exposure Time estimates
    • Uses Pandeia to compute the exposure time needed to reach a target SNR 

    • Benchmark based on 100 estimates

Catalog Cross-matching

  • Case 1: ZTF x PanSTARRS cross match
    • Cross matches ~10,000 ZTF sources against the Pan-STARRS catalog 
    • Input catalogs sourced from: 
      • ZTF pulled from IRSA’s public S3 bucket

      • Pan-STARRS pulled from the STScI Open Data S3 bucket

    • Benchmark assumes the ZTF catalog was pre-selected via a cone search to limit the sample to 10,000 sources
    • Reported CPU usage reflects only the cross-matching computation, not the initial catalog query
    • Catalogs are streamed directly into memory
    • No intermediate disk I/O operations performed during the matching step
    • This benchmark was executed using a parallelized cross-matching implementation (distributed across multiple cores within the server).




Benchmark Results Summary 

The Table of Summary of Benchmark Results summarizes measured Roman Research Nexus CPU usage for the benchmark cases defined in the previous section. CPU hours are normalized per detector, mosaic, object, or batch as indicated.

Summary of Benchmark Results   

Operation

Case 

Server Size*

CPU Hours (normalized)

RomanCal Exposure-Level Pipeline (L1 → L2)

Case 1: Full processing

Small

0.0333 / detector

Case 2: No source products

Small

0.0117 / detector


RomanCal Mosaic-Level Pipeline (L2 → L3)

Case 1: 4-point dither, full processing

Small

0.232 / mosaic

Case 2: 4-point dither, no catalog

Small

0.143 / mosaic

Case 3: Full WFI FOV mosaic (18 detectors)

Medium

2.634 / mosaic





Roman I-Sim

Case 1: L1 simulation (~1000 galaxies)

Small

11.385 / detector

Case 2: L2-only simulation (~1000 galaxies)

Small

11.907 / detector

Case 3: L1 simulation (~1000 stars)

Small

3.521 / detector

Case 4: L2-only simulation (~1000 stars)

Small

3.815 / detector

Case 5: L1 simulation (~500 galaxies + 500 stars)

Small

7.479 / detector

Case 6: L2-only simulation (~500 galaxies + 500 stars)

Small

7.695 / detector

Aperture Photometry

Case 1: ~10,000 stars

Small

1.671×10⁻³ / 10,000 stars

Case 2: ~1,000 stars

Small

1.592×10⁻³ / 1,000 stars

Galaxy Shape Measurements

Case 1: Simple moments

Small

2.791×10⁻⁷ / galaxy

Case 2: Sérsic profile fitting

Small

6.833×10⁻⁴ / galaxy


AstroCut

Case 1: Cutout generation (100×100 pixels)

Small

3.130×10⁻³ / cutout

Case 2: Cutout generation (10×10 pixels)

Small

3.130×10⁻³ / cutout


Pandeia Exposure Time Calculations

Case 1: 100 SNR estimates

Small

0.038 / 100 estimates

Case 2: 100 limiting magnitude estimates

Small

0.414 / 100 estimates

Case 3: 100 exposure time estimates

Small

0.229 / 100 estimates

Catalog Cross-Matching

Case 1: ZTF × Pan-STARRS (~10,000 sources)

Small

0.021 / 10,000 sources matched

*Note: Server sizes listed reflect the smallest configuration on which each benchmark case could be executed. Larger servers may reduce wall-clock time, but total CPU hours typically decrease only for inherently multi-threaded tasks (e.g., RomanCal and Roman I-sim), not for primarily single-threaded or user-parallelized analyses.



How To Interpret the Benchmarks

The CPU usage values in the Table of Summary of Benchmark Results should always be interpreted in the context of the corresponding case definitions described above, including assumptions about input data volume, source density, and whether file I/O or catalog generation is included.

  • Benchmarks were executed on the smallest server configuration capable of running each case.

In most cases, server size was chosen to satisfy memory requirements rather than to optimize runtime.

  • CPU hours and wall-clock time are not the same quantity.

The values reported in this table are CPU hours, which measure total compute usage. Using a larger server may reduce the elapsed runtime (wall-clock time), but it does not necessarily reduce the total CPU hours consumed.

  • For primarily single-threaded tasks, larger servers do not reduce CPU usage.

Many operations execute largely in a single process. For these tasks, selecting a larger server mainly provides additional memory and typically does 

  • Parallelization can reduce wall-clock time but usually does not reduce CPU hours.

Tasks parallelized with frameworks such as Dask or Ray can complete faster by using more CPU cores simultaneously, but the total CPU hours are often similar (and may increase slightly due to parallel overhead).
 Some benchmarks on this page (e.g., catalog cross-matching) already reflect parallel execution.

  • Some tasks are inherently multi-threaded and may show reduced CPU usage on larger servers.

RomanCal and Roman I-sim support multi-threaded execution and can take advantage of additional CPU cores. For these tasks, both wall-clock time and total CPU usage may decrease when using larger server configurations. The reported values should therefore be interpreted as upper-limit estimates.

  • These values are intended as guidelines, not plug-in estimates.

Actual CPU usage may vary depending on algorithm choices, source density, I/O patterns, runtime parameters, and degree of parallelism.

  • Reported values represent upper-limit estimates.

Ongoing software optimization and infrastructure improvements are expected to reduce resource usage over time.

  • Expanded parallel capabilities are planned.

Beginning in FY27, the Roman Research Nexus is expected to offer larger server configurations for highly parallel workloads, along with additional support for parallel execution using AWS-native services.




For additional questions not answered in this article, please contact the Roman Help Desk.




Latest Update

 

Updates to examples
Publication

Initial publication of this article