NASA’s Science Mission Directorate Creates AI Language Model for Better Data Stewardship

2 MIN READ

admin

News

UPDATED Dec 16, 2024

PUBLISHED Feb 12, 2024

NASA’s Science Mission Directorate (SMD) Artificial Intelligence and Machine Learning (AIML) Working Group recently launched a Large Language Model (LLM) specially created to help the SMD manage data more efficiently. The model, developed in collaboration with IBM Research, will improve tasks such as assigning metadata, managing documentation, and intelligent search.

To train the model, the researchers used material from a variety of scientific sources related to the SMD’s subject matter areas. The largest percentage of the training data came from the NASA Astrophysics Data System (ADS), with the American Geophysical Union (AGU), the American Meteorological Society (AMS), and PubMed.

The new LLM was tested using multiple benchmarks that assess a model’s ability to reason and identify relevant information. In particular, the researchers used BLURB (Biomedical Language Understanding and Reasoning Benchmark) to measure the model’s ability to answer questions and classify text related to the biomedical field. The model also underwent testing on a variety of NASA-relevant scientific questions with SQUAD2 (Stanford Question Answering Dataset), which grades a model’s ability to answer reading comprehension questions or abstain when a question is impossible to answer.

The team also tested the model with a NASA SMD-specific benchmark. On every test, the new model showed a marked improvement in a variety of information-related tasks over the pre-trained model it was based on. The SMD encoder-only transformer model and the subsequently-refined SMD bi-encoder sentence transformer model are available on GitHub.

SMD is currently leveraging the new model to create a more robust search feature for NASA’s Science Discovery Engine. In the future, the SMD hopes to use LLMs to assist with a variety of data management tasks.

News And Events

View All News and Events

News

NASA Redesigns Science Discovery Engine Infrastructure

The search tool's new update helps it deliver faster results, a better user experience, and more flexibility for science data discovery.

PUBLISHED Mar 9, 2026

News

January 2026 Science Data and Software Highlights

Selected monthly updates about NASA Science data and resources, including a new supercomputer, the launch of a multi-messenger astrophysics website, and new observations of interstellar comet 3I/ATLAS.

PUBLISHED Feb 5, 2026

News

December 2025 Science Data and Software Highlights

Selected monthly updates about NASA Science data and resources, including an open code repository for Earth satellite data, a SPHEREx all-sky map, and new publicly available datasets about living in space.

PUBLISHED Jan 12, 2026

News

November 2025 Science Data and Software Highlights

Selected monthly updates about NASA Science data and resources, including a Lucy mission data release, improvements to SPHEREx data, and a collaboration with Microsoft’s Planetary Computer.

PUBLISHED Dec 5, 2025

NASA’s Science Mission Directorate Creates AI Language Model for Better Data Stewardship

News And Events

NASA Redesigns Science Discovery Engine Infrastructure

January 2026 Science Data and Software Highlights

December 2025 Science Data and Software Highlights

November 2025 Science Data and Software Highlights

Search for NASA Science Data

Science Discovery Engine

Science Explorer (SciX)

NASA Science Data Portal

Science Mission Directorate Division Data Sites

Astrophysics Data

Biological & Physical Sciences

Earth Science

Heliophysics

Planetary Science

Interdisciplinary Data Sites

ACROSS

Astronomy and Geodesy

Opacities

PEGASUS Stellar Spectra

Citizen Science

High-End Computing

Science Cloud

Science Discovery Engine

Science Explorer Digital Library (SciX)

About NASA Science Data

Open Data Registry Project

Science Data Licenses

NASA Science Home

NASA’s Science Mission Directorate Creates AI Language Model for Better Data Stewardship

NASA Redesigns Science Discovery Engine Infrastructure

January 2026 Science Data and Software Highlights

December 2025 Science Data and Software Highlights

November 2025 Science Data and Software Highlights

Search for NASA Science Data

Science Discovery Engine

Science Explorer (SciX)

NASA Science Data Portal