Roman STScI Data Pipelines

This article contains a high-level overview of the science data pipelines for the Roman Wide Field Instrument (WFI) data processing at STScI including design philosophy and installation instructions.


Overview of WFI Pipelines at STScI

WFI imaging observations are processed through several pipelines to create different data products. At STScI, there exist three pipelines for:

  • Detector-level calibration of Level 1 to Level 2 WFI data products called exposure_pipeline
  • Repixelation and mosaicking of Level 2 WFI imaging data into Level 3 products named  resample
  • Generation of Level 4 catalogs from Level 2 and Level 3 products (under development)

Note that additional data processing specific to the WFI spectroscopic mode and microlensing exoplanet science are carried out by the Science Support Center at IPAC.

The romancal repository undergoes continuous integration testing using both unit tests and larger regression test suites to ensure that changes to the code do not result in unexpected changes of the products. Furthermore, before WFI pipeline steps are released, they are rigorously tested and validated by the instrument science team at STScI.

We expect that most users will be able to use data products directly from the Roman archive (see Accessing WFI Data article for more information); however, there may be instances when users wish to re-run elements of the WFI science data pipelines or customize the pipeline for particular science use cases.

Installation Instructions

All of the STScI pipelines for Roman are contained in a single Python package called romancal that is publicly developed on GitHub with released versions available via the Python Package Index (PyPI)

Additional information on how to install specific versions, including the latest development version, can be found on the pipeline installation page of the romancal readthedocs documentation. Basic installation on a Unix-based operating system using a Conda environment manager can be accomplished in a bash terminal by typing the following:

$ conda create -n <environment_name> python
$ conda activate <environment_name>
$ pip install romancal 

Note that the $ symbol indicates the bash prompt. The variable Input Parameter environment_name is at the discretion of the user. By indicating the argument "python" during the environment creation, the latest available version of Python will be installed in the environment along with other necessary tools such as pip.

Installing  romancal will install several other dependency packages including but not limited to:

  • roman_datamodels
  • asdf
  • crds

Users will also need access to calibration reference files for some pipeline steps, and should see CRDS for Reference Files for additional information including how to set up necessary environment variables.

Pipeline Descriptions

Here we describe at a high-level the individual pipelines used to produce Roman WFI data products. Detailed information about each pipeline will be provided in separate articles. Users are advised to also consult the Data Levels and Products page for information on the formats and contents of different WFI data products for context.

The WFI detectors are an updated iteration of the detectors used in JWST instruments; therefore, the philosophical starting point for development of the WFI data pipelines is the JWST science data pipeline. Deviations from the JWST pipelines occur when either the JWST pipeline steps are not appropriate or are insufficient for WFI data, or when Roman mission science accuracy requirements necessitate changes to the underlying algorithms.

Level 2 Exposure Pipeline

The Level 2 exposure pipeline contains the algorithms necessary to correct raw WFI ramps for instrumental effects, and collapse the ramps along the time axis into rate images suitable for scientific analysis. The exposure pipeline corrects for the following instrumental effects:

  • Signal induced by the readout electronics (e.g., 1/f noise) using reference pixels
  • Dark current
  • Classic non-linearity
  • Flat-field (variations in quantum efficiency)

In addition, static bad pixels and pixels with poor calibration from the calibration reference files are flagged in the data quality arrays. Rows that intersect the guide window on each detector are flagged in the data quality arrays due to changes in the noise properties of the intersecting rows. Finally, the following are also performed:

  • A slope per pixel is fit up the ramp to produce a count rate per pixel, which is transformed to photoelectrons per pixel
  • A WCS model, including the geometric distortion, for transformation from pixels to sky coordinates (and the inverse) is added to the metadata
  • Photometric calibration information including zeropoints and nominal pixel area are added to the metadata
  • Alignment to Gaia astrometric sources is performed to update the WCS model

Note that the input Level 1 files to the exposure pipeline are separated per WFI detector (i.e., there are 18 files for a full WFI exposure), and likewise the output Level 2 files from the exposure pipeline are also separated per detector.

Level 3 Mosaic Pipeline

Information regarding the WFI mosaic pipeline will be added in future RDox releases.

Level 4 Catalog Pipeline

Information regarding the generation of WFI catalog products will be added in future RDox releases.

Automatic Data Processing

As WFI data are downlinked from the Roman spacecraft, they are automatically processed through several data pipelines with only a few variations depending on the observation type. These variations, if present, are described in detail in the articles for each pipeline. An example of these variations is that STScI does not apply a flat-field correction to WFI spectroscopic observations, and instead the Science Support Center applies a wavelength-dependent flat-field correction as part of the spectroscopic pipeline. After data products are generated, they are ingested into MAST and immediately made available to the community with no proprietary exclusive access period. See Accessing WFI Data for more information on how to retrieve WFI data products.

Data Processing On-Premises and in the Cloud

Of the automatic pipelines, only the calibration of Level 1 ramps into Level 2 rate images is performed on-premises at STScI. High-level processing, which is composed of both the Level 3 mosaic and Level 4 catalog pipelines, is performed in an Amazon Web Services (AWS) cloud environment using Elastic Cloud Compute (EC2) instances. These EC2 instances allow for the cost-efficient compute at scale necessary for the large volume of WFI data. 




For additional questions not answered in this article, please contact the Roman Help Desk at STScI.



References

  1. The Roman Space Telescope Calibration Pipeline, Readthedocs maintained by STScI 2023, Latest version
  2. JWST Science Calibration Pipeline Overview, Last Update 29 Nov 2022, JWST User Documentation (JDox)




Latest Update

Publication

 

Initial publication of the article.