The Metadata Mission: How the ARC Project's Legacy Enriches Earth Science Data


SOURCE: EARTHDATA.NASA.GOV
DEC 22, 2025

Rachel Wyatt, Research Associate, University of Alabama in Huntsville

Dec. 22, 2025

Blog

For more than eight years, a team of informatics experts and data scientists working on the Analysis and Review of Common Metadata Repository (ARC) project has made it their mission to ensure that Earthdata Search users can easily find, access, and use the most relevant data products for their research goals. Search results and information in Earthdata Search, a web-based tool that allows users to discover and access NASA's Earth science data, are powered by NASA's Common Metadata Repository (CMR) database.

CMR provides the organizational framework for all metadata records that describe NASA's Earth science data products. Without reliable, centralized metadata from CMR, researchers would have much more difficulty discovering, retrieving, and applying NASA’s Earth observation data effectively.

Under management by the Office of Data Science and Informatics (ODSI) at NASA's Marshall Space Flight Center in Huntsville, Alabama, the ARC project officially concluded on Sept. 30, 2025. However, metadata curation workflows and tools developed by the ARC team will remain in use by NASA's Earth Science Data and Information System (ESDIS) Project.

Origins of the ARC Project

In 2017, the ARC project was initiated in response to a growing need from the Earth science research community: more complete and consistent metadata to enhance data discoverability, usability, and accessibility. Comprehensive metadata is critical in powering productive searches and maximizing the use of NASA's vast array of Earth observation data products. In total, more than 11,000 NASA data products (collections) are described by metadata stored in the CMR and can be accessed via a CMR application programming interface (CMR API).

Image

Three people stand around a touch-screen table, which displays information about NASA's CDDIS data archive.

Image Caption

Metadata shared by DAACs such as the Crustal Dynamics Data Information System (CDDIS) helps make NASA’s Earth observation data accessible to users. Credit: NASA ODSI

While NASA Earth observation records within CMR typically meet minimum metadata requirements, researchers found that some lacked useful contextual information such as user guides, dataset landing pages, or algorithm theoretical basis documents (ATBDs). Metadata supplied by NASA’s Distributed Active Archive Centers (DAACs) also varied significantly, indicating a need for greater consistency. To address these issues, the ARC project was tasked with evaluating CMR records based on three key qualities: correctness, completeness, and consistency.

Setting High-Quality Metadata Standards

When the ARC project was created, the team began by assessing the current state of CMR metadata records. Their analysis showed that, at the time, most metadata records included content for only about 40% of the available metadata fields in a given standard. Many records did not include all available and applicable metadata tags that would provide comprehensive context for a user.

To address these information gaps, the ARC team began developing a detailed framework to systematically evaluate NASA’s Earth observation metadata records. The resulting metadata quality assessment framework created a collaborative method for data providers and data stewards at the DAACs to consistently ensure that high-quality NASA Earth observation metadata standards are applied. The framework also established a baseline from which to measure future metadata improvement.

Image

Full ARC project workflow.

Image Caption

Full ARC project workflow involving the ARC team, the CMR team, and DAACs. Credit: ARC team

Automating ARChitecture

Well-maintained metadata lowers barriers for researchers, helping them focus their time and effort on their research questions rather than tracking down information and resources. However, carefully checking metadata for thousands of data products is a daunting task, especially if assessments are handled manually. ARC helped streamline this process by incorporating both automated checks and targeted manual assessments.

The automated system, called pyQuARC, quickly flags possible areas of improvement in the metadata records. Findings are categorized according to ARC’s priority matrix and ranked as red (high priority), yellow (medium priority), or blue (low priority). Results are then shared with the DAACs including detailed recommendations about how to improve identified metadata findings.

Image

The ARC metadata assessment process.

Image Caption

The ARC metadata assessment process. Credit: ARC team

pyQuARC is a Python code package designed specifically to assess metadata quality. Through automated checks, pyQuARC applies the metadata quality criteria developed by the ARC team, performing basic validation checks such as ensuring the data adheres to the required metadata schema and uses approved controlled vocabularies. Beyond these checks, pyQuARC identifies opportunities to refine contextual metadata, helping users better understand and access relevant data products. The code also checks that information common to both the main data product and its corresponding file-level metadata are compatible and consistent.

The ARC team openly released the pyQuARC code package in 2021 on GitHub and has maintained and transparently updated the software since then, demonstrating their commitment to open science best practices. The code enables DAACs to evaluate their metadata automatically and on demand, reducing the effort involved in manually assessing metadata quality.

ARC developers also planned for pyQuARC to be customizable. While pyQuARC was built specifically for Earth observation data, the concepts of high-quality metadata (correct, complete, consistent) are universal, allowing pyQuARC's framework to be adopted and adapted by other science disciplines or any industry that needs to catalog data efficiently. Users can even add their own custom rules without having to change the core code.

Measuring ARC’s Impact

Throughout ARC’s project lifecycle, the team evaluated tens of thousands of metadata records. In the initial phase, ARC reviewed a subset of records from each DAAC. Approximately 79,000 metadata elements were assessed during several rounds of curation. Consequent adjustments to the metadata produced a 68% improvement in high-priority findings across DAACs, a 53% improvement in medium-priority findings, and a 37% improvement in low-priority findings. After reviewing the initial subset of DAAC metadata records, ARC continued to evaluate all other records within each DAAC.

Image

A bar graph showing metadata quality improvements for the initial set of DAAC records reviewed by the ARC team.

Image Caption

Metadata quality improvements for the initial set of DAAC records reviewed by the ARC team. Credit: ARC team

In September 2019, the ARC team earned the Group Achievement Honor Award from NASA Marshall for their initial work toward improving Earth observation metadata. Their accomplishments were recognized as a commitment to innovation, collaboration, and teamwork, enhancing the quality of data systems for NASA Marshall’s Earth Science mission.

Image

The ARC team receiving the NASA Marshall Group Achievement Honor Award in 2019.

Image Caption

ARC team receiving the NASA Marshall Group Achievement Honor Award in 2019. Credit: NASA ODSI

In addition to reviewing existing NASA Earth observation metadata records in CMR, ARC also assisted with evaluating draft metadata records for upcoming NASA missions. For example, ARC conducted metadata quality assessments of 13 NASA-ISRO Synthetic Aperture Radar (NISAR) records in early 2025, and the results were reported to NASA's Alaska Satellite Facility DAAC (ASF DAAC). By ensuring the accuracy and completeness of the metadata before launch, NISAR's data will be more readily findable and usable when published.

Ending an Era

The ARC project was one of the original initiatives managed by NASA's fledgling Interagency Implementation and Advanced Concepts Team (IMPACT), now ODSI. Since 2017, ARC team leadership has transferred several times, but remarkably, almost all former project leads still remain with ODSI. Much of the project's success can be attributed to the continuity and unwavering support inherent among this group of informatics experts.

Image

Former ARC team members and leads working in NASA’s Office of Data Science and Informatics.

Image Caption

Former ARC team members and leads working in NASA’s Office of Data Science and Informatics. Credit: NASA ODSI

While the ARC project has concluded, the assets produced by the team over the past eight years will remain accessible and in use. ESDIS will continue CMR metadata curation and pyQuARC maintenance, and ARC's metadata framework will still be available in the CMR wiki space. Also, as a final step in making the pyQuARC software package as accessible as possible, the software will be submitted to NASA's public Open Source Code Repository.

Behind the scenes of science, there are always many people contributing to the meticulous, detailed work required to prepare, share, and maintain the data produced by NASA's missions and researchers. Such was the case with ARC, championed by a team of data stewards who were intent on enhancing the return on investment for one of NASA's most valuable assets – our data – collaboratively working with the DAACs to make those assets more accessible, discoverable, and usable for everyone.

Details

Last Updated

Dec. 22, 2025

Published

Dec. 22, 2025