Skip to Main Content
Eclipse Soundscapes Logo
Eclipse Soundscapes Logo
  • ES Project
    • About the Project
    • The Science
    • Our Team
  • Your Data in Action
    • Open Data Policy
  • ES Learning Community
  • Receive Updates
  • Volunteer Roles
    • Roles Overview
    • Apprentice Role
    • Observer Role
    • Data Collector Role
  • Resources
    • Blog
    • Eclipse Location Tool
    • Educator Resources
    • Media Kit
    • External Eclipse Resources
    • FAQ
  • Contact us

How Artificial Intelligence Might Shape the Future of Eclipse Soundscapes Data

By MaryKay Severino and Henry "Trae" Winter

a glowing head profile with “AI” inside, connected by lines to icons representing documents, cloud storage, databases, gears, magnifying glasses, binoculars, and a target, all overlaid on a laptop keyboard background.It’s hard to avoid hearing about how Artificial Intelligence (AI) is changing the way we live and work. Today, the word AI is used to describe many different kinds of computer programs that can learn and help machines solve tough problems, sometimes with human help and sometimes completely on their own. Organizations are working to keep up with the rapid changes in AI tools, best practices, and questions about ethics. Both researchers and managers are taking these changes seriously and are figuring out how to best use AI in NASA’s mission to share the exploration of the universe around us.

On the left, a box labeled “Audio Data” contains a soundwave with small icons of a bird, butterfly, and insect. In the center, a glowing AI brain with circuitry labeled “AI” overlaps and wraps around a cloud and folder icon on the right labeled “Data Repository.” The design symbolizes audio data being stored in repositories, with AI integrated into the repository to help with searches.Members of the Eclipse Soundscapes (ES) team recently attended a NASA open data repositories workshop that prompted us to consider how AI might impact the Eclipse Soundscapes Project, even as it comes to an end. AI is starting to influence many areas of research and data sharing. One way that AI might impact large datasets, like the 500+ ES audio datasets, is by helping future researchers find, process, and analyze large amounts of data more efficiently and effectively.  

This raised an important question for us: How might these very near-future AI possibilities impact the way we share the audio data collected by ES Data Collectors during the 2023 Annular Eclipse and the 2024 Total Solar Eclipse?  Here is what we learned and what we decided to do:

Preparing Data For AI Searches

Illustration comparing a human and AI reading “forest” and “wooded area.” The human imagines the same trees, while the AI shows confusion with warning and question marks.One topic of discussion was how projects can prepare data and metadata so they are searchable by AI, since this may be the way of the future.

Right now, lots of metadata (information about the data) is language-based. That means additional information about the audio data, like site notes, habitat descriptions, or weather descriptions, might be recorded as words or phrases rather than numbers or standardized codes. While this works well for people reading the data, it makes it harder for AI to process consistently.

Language-based metadata examples from ES

  • Site Location notes might say “near cattle pasture.”
  • Habitat notes could say “forest,” “woods,” or “woodland.” These all mean the same thing to a person, but could be interpreted differently by AI.

Data Repositories and Preparing for AI

A cloud outline contains two file folders above stacked server drawers, with an upward arrow on the right and a downward arrow on the left, symbolizing data upload and download.A data repository is a platform where projects store their data so that it can be preserved and reused by others. If data repositories want improved AI search functionality in the future, they may eventually require that data be submitted in new AI search-ready formats.

  • Zenodo, the platform ES uses to store and share its audio and observation data, is one example of a data repository. 
  • GitHub, the platform where ES shares its software and code, is another example of a data repository. 

multiple arrows branching in different directions, surrounded by icons of code, gears, checklists, people, networks, and warning signs. A large box at the bottom reads: “No clear path for AI search in Data Repositories,” with purple database and folder icons.Not all data repositories have decided on standards for AI search.  GitHub has introduced AI tools such as Anthropic Claude Sonnet, ChatGPT, and Gemini 2.5 Pro for creating code, but has not yet included AI agents for finding already existing code. Zenodo has not yet incorporated AI tools into its repository, and adding such tools is not in its current development roadmap.  With the AI search landscape changing so quickly, it is hard to predict how AI search tools will be implemented in data repositories and how data providers should format their data for AI. 

Vector Databases: An AI Search-Friendly Format

flowchart showing how data becomes part of a vector database. On the left, a box labeled “Source Data” contains icons for text, audio, and image. An arrow points to a funnel labeled “Embedding,” which outputs numbers representing vector data. Another arrow points to the right, where a stacked database icon and a grid of binary numbers are labeled “Vector Database.”One AI search-friendly option that was discussed is putting each project’s data into a vector database that could be shared with its chosen data repository. A vector database combines data with metadata and also describes that metadata numerically rather than through language and keywords.

These numerical metadata descriptions make it easier for AI to:

  • Recognize similarities rather than just exact search term matches
  • Remember and connect previous inputs
  • Understand data in a broader context

Examples of vector database platforms include:

  • ChromaDB (Open Source, Python-based) https://github.com/chroma-core/chroma
  • Pinecone (Commercial)  https://www.pinecone.io/

Zenodo, the repository where ES data is being archived, does not currently have a plan to support vector databases. It is impossible to predict how Zenodo or other online data repositories might incorporate vector databases and what future standards they may require.

ES’s Decision

Creating a vector database is more than what Eclipse Soundscapes can take on right now. It would take more time and resources than the project has and would mean looking for new data repositories or doing extensive work to fit it into Zenodo’s framework.

Still, we’re glad we explored this possibility. Thinking about what AI might mean for scientific data is worthwhile, even if we can’t take it on ourselves. As projects wind down, it helps to keep looking ahead. Our team will carry this knowledge into future efforts, and by sharing it here, the ES community can carry it forward too.

If you want to learn more about vector databases, check out these articles:

  • Microsoft’s article: Understanding Vector Databases
  • Cloudflare’s article: What is a Vector Database?

Official NASA grantee logo Eclipse Soundscapes is an enterprise of ARISA Lab, LLC and is supported by NASA award No. 80NSSC21M0008. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.

Privacy Statement | Terms of Service  | Digital Policy | Open Data Policy 

Privacy Preference Center

Privacy Preferences