Month: September 2025
How Artificial Intelligence Might Shape the Future of Eclipse Soundscapes Data
How Artificial Intelligence Might Shape the Future of Eclipse Soundscapes Data
By MaryKay Severino and Henry "Trae" Winter
This raised an important question for us: How might these very near-future AI possibilities impact the way we share the audio data collected by ES Data Collectors during the 2023 Annular Eclipse and the 2024 Total Solar Eclipse? Here is what we learned and what we decided to do:
Preparing Data For AI Searches
Right now, lots of metadata (information about the data) is language-based. That means additional information about the audio data, like site notes, habitat descriptions, or weather descriptions, might be recorded as words or phrases rather than numbers or standardized codes. While this works well for people reading the data, it makes it harder for AI to process consistently.
Language-based metadata examples from ES
- Site Location notes might say “near cattle pasture.”
- Habitat notes could say “forest,” “woods,” or “woodland.” These all mean the same thing to a person, but could be interpreted differently by AI.
Data Repositories and Preparing for AI
- Zenodo, the platform ES uses to store and share its audio and observation data, is one example of a data repository.
- GitHub, the platform where ES shares its software and code, is another example of a data repository.
Vector Databases: An AI Search-Friendly Format
These numerical metadata descriptions make it easier for AI to:
- Recognize similarities rather than just exact search term matches
- Remember and connect previous inputs
- Understand data in a broader context
Examples of vector database platforms include:
- ChromaDB (Open Source, Python-based) https://github.com/chroma-core/chroma
- Pinecone (Commercial) https://www.pinecone.io/
Zenodo, the repository where ES data is being archived, does not currently have a plan to support vector databases. It is impossible to predict how Zenodo or other online data repositories might incorporate vector databases and what future standards they may require.
ES’s Decision
Creating a vector database is more than what Eclipse Soundscapes can take on right now. It would take more time and resources than the project has and would mean looking for new data repositories or doing extensive work to fit it into Zenodo’s framework.
Still, we’re glad we explored this possibility. Thinking about what AI might mean for scientific data is worthwhile, even if we can’t take it on ourselves. As projects wind down, it helps to keep looking ahead. Our team will carry this knowledge into future efforts, and by sharing it here, the ES community can carry it forward too.
If you want to learn more about vector databases, check out these articles:
- Microsoft’s article: Understanding Vector Databases
- Cloudflare’s article: What is a Vector Database?