Introduction to ASDF

This article provides an introduction to the Advanced Scientific Data Format (ASDF), a growing successor to FITS for archiving astronomical data, and the file format of choice for Roman data products. 





What is ASDF? 

The Advanced Scientific Data Format (ASDF; pronounced "AZ-diff") is a file format designed as a successor to the existing FITS format for archiving astronomical data. ASDF files are composed of human-readable, text-based metadata alongside binary data.

ASDF aims to evolve from FITS by building upon other widely-used formats. ASDF is a hierarchically-structured data format that stores binary data along with the YAML metadata. The recursive YAML acronym stands for "YAML Ain't Markup Language" and is a widely used, human-readable data serialization language. ASDF provides native support for file streaming, internal array compression, and self-validation through schema, and enforces explicit, per-file versioning of the format itself and the features derived from it.

The ASDF standard defines the goals for the ASDF format. ASDF serves as the default file type for data delivery by Roman, and the mission's calibration pipelines are exclusively designed for use with ASDF. For more information on the motivations behind the standard, see "ASDF: A new data format for astronomy" (Greenfield, Droettboom, & Bray 2015). The Space Telescope Science Institute (STScI) actively develops and maintains ASDF, and both the JWST mission and the Daniel K. Inouye Solar Telescope (DKIST) currently use this file format in addition to the Roman mission. 


Why ASDF?

To date, FITS is the most commonly used format for archiving and analyzing observational astronomical data. FITS files were developed in the 1970s and have a long history of wide use in astronomy. However, as astronomy progresses, FITS cannot natively handle the specialized and complex data models and metadata that have been developed. More information about the limitations of FITS files can be found in the astronomical literature, e.g., "Learning from FITS: Limitations in use in modern astronomical research" (Developed in the 1970s, FITS has a long history of widespread use in astronomy. However, as the field progresses, FITS cannot natively accommodate the specialized and complex data models and metadata that have since been developed (Thomas et al. 2015).

ASDF data is serialized and stored as binary data blocks, and uses a strategy that efficiently manages resources by enabling the reading of specific data portions without the need to load the entire information into memory—a concept known as lazy loading. In practical terms, this allows users to open an ASDF file, extract specific criteria from the metadata, and load only the relevant portion of the science data into memory (e.g., loading only the image data quality array), without the need to load the entire file. Additionally, ASDF supports memory mapping of binary data, enabling the segmentation of science data through slicing or leveraging the Zarr file storage format for chunked, compressed, N-dimensional arrays.

Similar to other file formats, ASDF allows for the storage of multiple data types including strings, integers, and floats. Additionally, ASDF supports internal compression of binary arrays using lossless compression algorithms such as bzip2, zlib, and LZ4 store advanced data types, such as programming objects (e.g., Astropy time objects), and features a highly extendable format. For instance, ASDF can manage complex distortion models using the Generalized World Coordinate System (GWCS) Python object. ASDF is also self-validating, ensuring data adherence to a specified model for increased reliability. Another advantage of ASDF lies in its usage of human-readable metadata and a hierarchical structure, contributing to a clear and organized representation of information.

Given the all the advantages of ASDF, the  romancal pipeline has been designed to natively handle and serve ASDF data products.


Handling ASDF

The Python implementation of ASDF includes a basic guide on how to use the asdf package to create and read ASDF files. For a more comprehensive overview of working with ASDF files in Python, refer to the documentation on ASDF core features. Both articles include walkthroughs of sample ASDF files and code snippets demonstrating how to handle them. 

Note that while ASDF is native to Python, it is expanding to other languages. Prototype implementations for C++, such as asdf-cpp and asdf-cxx, are available. An ASDF implementation for Julia named, asdf.jl, is also available. Additionally, IDL version 8.9 now supports ASDF.

Important: for software neither hosted nor maintained by STScI, users must contact the respective software developers directly for support. 


ASDF in Roman


A data schema is a formalized structure written in YAML that defines the organization and content of data stored in the ASDF file. Schema written to describe and validate Roman ASDF data products are hosted in the Roman Attribute Dictionary (RAD) GitHub repository. When reading an ASDF file, if the schema are available in the computing environment, the file will be automatically validated against the schema.

The  roman_datamodels package provides the capability of reading and writing Roman WFI ASDF files, offering some additional conveniences such as dot notation to access file contents (see ASDF Examples below).  roman_datamodels uses its knowledge of the file schema to map the contents of an ASDF file into a Python object and is recommended for most users for an optimal experience when working with Roman WFI ASDF files. However, because roman_datamodels is based on the WFI schema, it may be more advantageous to use the asdf package when accessing additional products (e.g., products simulated with Roman I-Sim - The Roman Image Simulator).


ASDF Example Python Code 

Opening Files with roman_datamodels

The  roman_datamodels package can be used to read and interact with Roman ASDF files as shown below. For more information on Roman file contents shown in this example, see Data Levels and Products.

Loading a Roman File with roman_datamodels
import roman_datamodels as rdd

# open the file
file = rdd.open('r0000101001001001001_01101_0001_WFI01_cal.asdf')

# load the data quality array
dq = file.dq

# get metadata
meta = file.meta

# get date and time of the exposure from the metadata
start_time = meta.exposure.start_time
end_time = meta.exposure.end_time
print(start_time, end_time)

Running the above code block should yield a result similar to the following:

Result
(<Time object: scale='utc' format='isot' value=2021-01-01T00:00:00.000>,
 <Time object: scale='utc' format='isot' value=2021-01-01T00:02:28.960>) 

To select all of the nested information from a particular section of metadata, the following code block may be run:

Observation Metadata
# select all of the observation info
meta.observation

yielding a result similar to the following:

Result
{'obs_id': '0000101001001001001011010001',
'visit_id': '0000101001001001001',
'program': 1, 
'execution_plan': 1, 
'pass': 1, 
'observation': 1, 
'segment': 1, 
'visit': 1, 
'visit_file_group': 1, 
'visit_file_sequence': 1, 
'visit_file_activity': '01', 
'exposure': 1, 
'template': 'NONE',
'observation_label': 'TEST', 
'survey': 'N/A'}

Opening Files with asdf

Alternatively, the  asdf package can be used to open an ASDF file and browse the file tree as shown below.

Loading a Roman File with the asdf Package
#Load the ASDF library and open a file, the contents are contained in the "tree"

import asdf
file = asdf.open('r0000101001001001001_01101_0001_WFI01_cal.asdf')
roman_tree = file.tree['roman']

# get exposure start time from the metadata
start_time = roman_tree['meta']['exposure']['start_time']

# load the data array
data = roman_tree['data']
print(data.shape)

Running the above block should yield a result of (4088, 4088).




For additional questions not answered in this article, please contact the Roman Help Desk at STScI.




References

  1. "ASDF: A new data format for astronomy", Greenfield, Droettboom, & Bray 2015 

  2. "Learning from FITS: Limitations in use in modern astronomical research", Thomas et al. 2015
  3. ASDF Standard (version 1.0.3), readthedocs maintained by the ASDF Developers.  Revision e19a307b.
  4. ASDF Python API, readthedocs maintained by the The ASDF Developers. Revision 79356ab2.
  5. YAML Ain't Markup Language, YAML resources homepage. YAML 1.2.2
  6. Roman Datamodels, readthedocs maintained by STScI. Revision c9d63231




Latest Update

 

Most recent RDox update.
Publication

 

Initial publication of the article.