Accessing WFI Data

The large data volume produced by Nancy Grace Roman Space Telescope observations requires new ways to store, host, and access data. This article summarizes plan for providing access to Roman Wide Field Instrument (WFI) data at STScI and describes ways in which those data are different from HST and JWST, including the introduction of cloud-based data access and analysis tools.



The Roman WFI Era of Big Data 

Roman Wide Field Instrument (WFI) will produce data at significantly higher rates than that of JWST or Hubble instruments. In the table below, the Roman WFI is compared against the JWST NIRCam instrument and the Hubble Wide Field Camera 3 (WFC3) IR channel. The WFI Data Format page contains additional details about the different data formats at various stages of the calibration process. The observatory will downlink up to approximately 11 Tb (1375 GB) of compressed WFI observation data per day on average, which requires the calibration, science analysis, and archive software pipelines to perform at scale.

Table to Compare Hubble WFC3, JWST NIRCam, and Roman WFI


Hubble WFC3JWST NIRCamRoman WFI

Number of pixels 

1 Megapixel34 Megapixel302 Megapixel
Field of view 4.7 arcmin29.7 arcmin21035 arcmin2
Field of view image data size0.013 GB (13 MB)2 GB8 GB

The Roman data volume is so much larger than those for prior space telescope missions that it is necessary to shift the paradigm of data access. The Roman Science Platform (RSP) will enable users to run calibration and analysis scripts on the cloud, i.e., "bringing the compute to the data." This model provides rapid and cost-effective data access, alleviating file transfer times and the need for sufficiently powerful, user-maintained computing hardware. Roman users may download modest volumes of data (e.g., for recalibration purposes) and will be able to access those data through the archive and download them to their own machines. 

How to Access Roman WFI Science Data

All Roman science data will immediately be publicly available and there will be no exclusive access period. For context, the Roman STScI pipelines and their associated data products are briefly summarized. The Level 1 (L1) uncalibrated raw images will contain a 3-dimensional array of resultant images, and will be archived at STScI. After calibration and processing to Level 2 (L2), calibrated rate images, uncertainty arrays, and a data quality array are stored in one file per WFI detector (i.e., 18 files per WFI exposure). These L2 data will be accessible from STScI as well as through the cloud-based Roman science platform hosted on Amazon Web Services (AWS). All high-level data products will also reside on the cloud, including the mosaicked and resampled Level 3 (L3) images and Level 4 (L4) high level products (e.g. scientific catalogs, light curves, and 1-D spectra). Data at each level will be accompanied by metadata that describes the observing and processing history. The storage location of WFI data products are summarized in the Table of WFI Locations below.

light bulb More detailed information regarding pipelines and data products are provided in the Roman STScI Data Pipelines and Data Levels and Products articles, respectively. 

The Roman science data archive is currently in development, and changes to the final design may still occur. The location of and access to Level 5 (user-contributed) data products will be described in a future RDox release.

Table of WFI Data Locations



Brief DescriptionMikulski Archive for Space Telescopes (MAST)

On Premise at STScI 

Roman Science Platform (RSP)
At AWS US East-1 – Northern Virginia)

Level 1

uncalibrated ramps

(tick) (error) 
Level 2calibrated rate images(tick) (tick) 
Level 3mosaics(error)(tick) 
Level 4high level data products (e.g. catalogs, light curves, 1-D spectra)(error)(tick) 
Level 5user-contributed products(warning) TBD(warning) TBD

A significant portion of the Roman high-level science data products will be stored in an AWS open data bucket. An open data bucket allows for the egress (download) of high-value, public data at no cost to the facility hosting those data. More information on data in the open data bucket will be added in a future RDox release.

Users are  encouraged to use the Roman Science Platform (RSP) for the analysis of WFI data. The RSP environment allows easy access to WFI data, along with analysis tools and computing resources. More information on the RSP will be added in a future RDox release.


Users will be able to access Roman data through the Roman Science Platform (RSP) and the Barbara A. Mikulski Archive for Space Telescopes (MAST). The MAST website features a graphical user interface for querying data, such as searching or filtering files by keywords and positions. MAST will also enable WFI cutout services (similar to TESScut), a cross-mission source catalog-based search, and Virtual Observatory services. These services will also be enabled on the RSP, and MAST will offer a way for users to smoothly transition from requesting a download to starting up a compute instance on the RSP.

On the back-end, the Data Access Application Programming Interface (DAAPI) will provide unified public access to Roman data stored on-premises and in cloud endpoints. The DAAPI is not a single software stack, but rather a collection of services with consistent access patterns that will allow users to query Roman data. In principle, Roman data can also be directly downloaded using AWS URIs associated with each AWS Simple Storage Service (S3) object.

How to Access Reference Files and Other Data

The Calibrated Engineering Database (EDB) hosts engineering mnemonics that describes the Roman observatory and instruments, such as telemetry and spacecraft environment data. The EDB will be accessible via both the MAST website as well as a Python-based query engine.

Further information about the Calibrated Engineering Database, including examples of how to access the information, will be added in future RDox releases.

The Roman Calibration Reference Data System (CRDS) hosts files necessary for calibration and data processing (e.g., reference files for dark subtraction). CRDS can retrieve a particular reference file if the software pipeline, instrument, and reference file parameters (known as CRDS "context") are specified. Access to CRDS is provided via Python code that can be run on macOS and Linux, both in or outside of AWS. More information on CRDS may be found in the article CRDS for Reference Files.

MAST also provides an interface for searching the Roman Calibration and Supplemental Data (CaSSI) archive, which includes copies of the calibration reference files from CRDS as well as other engineering files.





For additional questions not answered in this article, please contact the Roman Help Desk at STScI.




Latest Update



Publication

 

Initial publication of the article.