Benchmarking Examples and Estimated Costs on the Roman Research Nexus
This page provides representative benchmarking examples for common tasks performed on the Roman Research Nexus, a cloud-based environment for Roman science. It summarizes typical CPU usage measured for well-defined cases and is intended to support planning and resource estimation for Nexus users. The benchmark cases and results shown here are illustrative and should be interpreted alongside the detailed case definitions that follow.
Benchmark Case Definition
This section defines the benchmark cases used to estimate typical CPU usage for common operations on the Roman Research Nexus. Each case describes the processing configuration, data volume, and key assumptions (e.g., inclusion of source detection or file I/O).
RomanCal Exposure-Level Pipeline: L1 → L2
Case 1: Single-detector calibration, full processing
- Executes the full RomanCal Exposure-Level Pipeline (L1 → L2) on one WFI detector
- Includes source detection and source catalog generation
- Executed entirely in memory (no file I/O)
- Scene-dependent test executed with ~1500 sources (combined stars and galaxies, uniformly distributed)
Case 2: Single-detector calibration, no catalog products
Executes RomanCal Exposure Level Pipeline (L1 → L2) on one WFI detector
Source detection and source catalog generation are excluded
Executed entirely in memory (no file I/O)
- Independent of source density since the detection and catalog steps are excluded
RomanCal Mosaic Level Pipeline: L2 → L3
- Case 1: Single-detector 4-point gap-filling dither, full processing
- Executes the full RomanCal Mosaic-Level Pipeline (L2 → L3) on one WFI detector
- Combines four exposure-level images from a 4-point gap-filling dither pattern into a single mosaic
- Includes source detection and source catalog generation
- Includes file I/O
- Scene-dependent test executed with ~4500 sources (galaxies uniformly distributed; stars clustered to simulate a star cluster)
- Case 2: Single-detector 4-point gap-filling dither, no catalog products
- Executes the RomanCal Mosaic-Level Pipeline (L2 → L3) on one WFI detector
- Combines four exposure-level images from a 4-point gap-filling dither pattern into a single mosaic
- Source detection and source catalog generation are excluded
- Includes file I/O
- Independent of source density since the detection and catalog steps are excluded
- Case 3: Full WFI field-of-view mosaic, no source products
- Executes the RomanCal Mosaic-Level Pipeline (L2 → L3) on eighteen WFI detectors observed in a single exposure
- Combines all 18 exposure-level images into a single mosaic
- Source detection and source catalog generation are excluded
- Includes file I/O
- Independent of source density since the detection and catalog steps are excluded
- Due to memory footprint, requires RAM ≥ 64GB (medium server or larger)
Roman I-Sim Simulations
All cases include PSF generation with STPSF.
- Case 1: Simulation of uncalibrated data (L1), galaxies
Uses Roman I-Sim to simulate WFI uncalibrated imaging data (L1) for one detector
- Input catalog contains ~1000 galaxies, uniformly distributed
- Case 2: Simulation of exposure-level calibrated product (L2), galaxies
Uses Roman I-Sim to simulate WFI exposure-level calibrated product (L2) for one detector
- Input catalog contains ~1000 galaxies, uniformly distributed
- Case 3: Simulation of uncalibrated data (L1), stars
Uses Roman I-Sim to simulate WFI uncalibrated imaging data (L1) for one detector
- Input catalog contains ~1000 stars, uniformly distributed
- Case 4: Simulation of exposure-level calibrated product (L2), stars
Uses Roman I-Sim to simulate WFI exposure-level calibrated product (L2) for one detector
- Input catalog contains ~1000 stars, uniformly distributed
- Case 5: Simulation of uncalibrated data (L1), mixed scene
Uses Roman I-Sim to simulate WFI uncalibrated imaging data (L1) for one detector
- Input catalog contains ~500 galaxies and ~500 stars, uniformly distributed
- Case 6: Simulation of exposure-level calibrated product (L2), mixed scene
Uses Roman I-Sim to simulate WFI exposure-level calibrated product (L2) for one detector
- Input catalog contains ~500 galaxies and ~500 stars, uniformly distributed
Aperture Photometry
- Case 1: Large stellar catalog
Performs source detection and aperture photometry on ~10,000 stars
Approximately 92% of injected sources recovered
- Case 2: Moderate stellar catalog
Performs source detection and aperture photometry on ~1,000 stars
All injected sources recovered, with some spurious detections
Galaxy Shape Measurements
- Case 1: Simple moment-based shape measurements
- Light-weight per-object shape estimator computing basic galaxy-shape quantities:
- position angle
- major-to-minor axis ratio
- Light-weight per-object shape estimator computing basic galaxy-shape quantities:
- Case 2: Sérsic profile fitting
- Fits a Sérsic surface-brightness model to each individual galaxy
- Computes full covariance matrix of fitted parameters
Astrocut
- Case 1: Cutout generation (100×100 pixels)
Uses AstroCut to generate 100 cutouts of size 100×100 pixels
- Case 2: Cutout generation, 10×10 pixels
Uses AstroCut to generate 100 cutouts of size 10×10 pixels
Exposure Time Calculations (Pandeia)
- Case 1: Signal-to-noise ratio estimates
Uses Pandeia to compute SNR for a source at a given magnitude and for a given exposure configuration
Benchmark based on 100 estimates
- Case 2: Limiting magnitude estimates
Uses Pandeia to compute the 5σ point-source limiting magnitude for a given exposure configuration
Benchmark based on 100 estimates
- Case 3: Exposure Time estimates
Uses Pandeia to compute the exposure time needed to reach a target SNR
- Benchmark based on 100 estimates
Catalog Cross-matching
- Case 1: ZTF x PanSTARRS cross match
- Cross matches ~10,000 ZTF sources against the Pan-STARRS catalog
- Input catalogs sourced from:
ZTF pulled from IRSA’s public S3 bucket
Pan-STARRS pulled from the STScI Open Data S3 bucket
- Benchmark assumes the ZTF catalog was pre-selected via a cone search to limit the sample to 10,000 sources
- Reported CPU usage reflects only the cross-matching computation, not the initial catalog query
- Catalogs are streamed directly into memory
- No intermediate disk I/O operations performed during the matching step
- This benchmark was executed using a parallelized cross-matching implementation (distributed across multiple cores within the server).
Benchmark Results Summary
The Table of Summary of Benchmark Results summarizes measured Roman Research Nexus CPU usage for the benchmark cases defined in the previous section. CPU hours are normalized per detector, mosaic, object, or batch as indicated.
Summary of Benchmark Results
Operation | Case | Server Size* | CPU Hours (normalized) |
|---|---|---|---|
| RomanCal Exposure-Level Pipeline (L1 → L2) | Case 1: Full processing | Small | 0.0333 / detector |
Case 2: No source products | Small | 0.0117 / detector | |
Case 1: 4-point dither, full processing | Small | 0.232 / mosaic | |
Case 2: 4-point dither, no catalog | Small | 0.143 / mosaic | |
Case 3: Full WFI FOV mosaic (18 detectors) | Medium | 2.634 / mosaic | |
Case 1: L1 simulation (~1000 galaxies) | Small | 11.385 / detector | |
Case 2: L2-only simulation (~1000 galaxies) | Small | 11.907 / detector | |
Case 3: L1 simulation (~1000 stars) | Small | 3.521 / detector | |
Case 4: L2-only simulation (~1000 stars) | Small | 3.815 / detector | |
Case 5: L1 simulation (~500 galaxies + 500 stars) | Small | 7.479 / detector | |
Case 6: L2-only simulation (~500 galaxies + 500 stars) | Small | 7.695 / detector | |
Case 1: ~10,000 stars | Small | 1.671×10⁻³ / 10,000 stars | |
Case 2: ~1,000 stars | Small | 1.592×10⁻³ / 1,000 stars | |
Case 1: Simple moments | Small | 2.791×10⁻⁷ / galaxy | |
Case 2: Sérsic profile fitting | Small | 6.833×10⁻⁴ / galaxy | |
Case 1: Cutout generation (100×100 pixels) | Small | 3.130×10⁻³ / cutout | |
Case 2: Cutout generation (10×10 pixels) | Small | 3.130×10⁻³ / cutout | |
Case 1: 100 SNR estimates | Small | 0.038 / 100 estimates | |
Case 2: 100 limiting magnitude estimates | Small | 0.414 / 100 estimates | |
Case 3: 100 exposure time estimates | Small | 0.229 / 100 estimates | |
Case 1: ZTF × Pan-STARRS (~10,000 sources) | Small | 0.021 / 10,000 sources matched |
*Note: Server sizes listed reflect the smallest configuration on which each benchmark case could be executed. Larger servers may reduce wall-clock time, but total CPU hours typically decrease only for inherently multi-threaded tasks (e.g., RomanCal and Roman I-sim), not for primarily single-threaded or user-parallelized analyses.
How To Interpret the Benchmarks
The CPU usage values in the Table of Summary of Benchmark Results should always be interpreted in the context of the corresponding case definitions described above, including assumptions about input data volume, source density, and whether file I/O or catalog generation is included.
- Benchmarks were executed on the smallest server configuration capable of running each case.
In most cases, server size was chosen to satisfy memory requirements rather than to optimize runtime.
- CPU hours and wall-clock time are not the same quantity.
The values reported in this table are CPU hours, which measure total compute usage. Using a larger server may reduce the elapsed runtime (wall-clock time), but it does not necessarily reduce the total CPU hours consumed.
- For primarily single-threaded tasks, larger servers do not reduce CPU usage.
Many operations execute largely in a single process. For these tasks, selecting a larger server mainly provides additional memory and typically does
- Parallelization can reduce wall-clock time but usually does not reduce CPU hours.
Tasks parallelized with frameworks such as Dask or Ray can complete faster by using more CPU cores simultaneously, but the total CPU hours are often similar (and may increase slightly due to parallel overhead).
Some benchmarks on this page (e.g., catalog cross-matching) already reflect parallel execution.
- Some tasks are inherently multi-threaded and may show reduced CPU usage on larger servers.
RomanCal and Roman I-sim support multi-threaded execution and can take advantage of additional CPU cores. For these tasks, both wall-clock time and total CPU usage may decrease when using larger server configurations. The reported values should therefore be interpreted as upper-limit estimates.
- These values are intended as guidelines, not plug-in estimates.
Actual CPU usage may vary depending on algorithm choices, source density, I/O patterns, runtime parameters, and degree of parallelism.
- Reported values represent upper-limit estimates.
Ongoing software optimization and infrastructure improvements are expected to reduce resource usage over time.
- Expanded parallel capabilities are planned.
Beginning in FY27, the Roman Research Nexus is expected to offer larger server configurations for highly parallel workloads, along with additional support for parallel execution using AWS-native services.
For additional questions not answered in this article, please contact the Roman Help Desk.
