AI model looks for missing pieces to puzzle

Nathan Jacobs’ lab builds multimodal embedding model

Beth Miller 
Nathan Jacobs’ lab developed ProM3E, a model that accepts any combination of inputs — a photograph, an audio recording, a satellite image, geographic location and more — and uses whatever it's given to help identify different species observed in nature. The model was used with data from iNaturalist and eBird, which has millions of pictures of plants, animals and birds, including this Northern Cardinal photographed in the St. Louis area, uploaded by users. (Credit: Alexander Viduetsky, iNaturalist).
Nathan Jacobs’ lab developed ProM3E, a model that accepts any combination of inputs — a photograph, an audio recording, a satellite image, geographic location and more — and uses whatever it's given to help identify different species observed in nature. The model was used with data from iNaturalist and eBird, which has millions of pictures of plants, animals and birds, including this Northern Cardinal photographed in the St. Louis area, uploaded by users. (Credit: Alexander Viduetsky, iNaturalist).

Artificial intelligence models are designed to quickly solve problems and answer questions. A team of computer scientists at Washington University in St. Louis has developed a model to help identify plant and animal species in nature.  

Srikumar Sastry, a doctoral student in the lab of Nathan Jacobs, professor of computer science & engineering, and collaborators developed ProM3E, a model that accepts any combination of inputs — a photograph, an audio recording, a satellite image, geographic location and more — and uses whatever it's given to help identify the species observed. This “any-to-any” model learns to infer missing information from context, combining available inputs into a shared embedding space.

Sastry will present the research at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 in June. 

The ProM3E model builds on the team’s Taxabind model published in 2025 that combined six modalities into one cohesive framework to address a diverse range of ecological tasks. In this model, they used six modalities: ground-level images of species, satellite images, geographic location, species audio, taxonomic text and environmental covariates.

Sastry said the ProM3E model was used with data from citizen science observations from iNaturalist and eBird, which has millions of pictures of plants, animals and birds uploaded by users that includes metadata, time stamps, geographic and other information. In addition, they used data from satellite providers that provide free and open-source imagery. 

“Previous works don’t consider arbitrary combinations of modalities — you might have a photo and a location, or audio and a satellite image — but ProM3E works with whatever you give it,” Sastry said. “Our model is trained in a self-supervised manner to extract representations and learn the embedding space.”

Sastry said the model is built on a deceptively simple idea: infer what's missing and quantify how confident that inference should be.

“We train the model to predict what a missing input might look like — and not just one answer, but a distribution of plausible answers," Sastry said. "If I give it satellite imagery, it doesn't just guess what the audio sounds like; it tells you how confident it is in that guess.”

The model is designed to generate insights about habitat and climate conditions of different geographic locations worldwide. Sastry said the model could be adapted to address remote sensing and ecological challenges, such as fine-tuning it on additional datasets to adapt it to future uses. 


Sastry S, Khanal S, Dhakal A, Lin J, Cher D, Jarosz P, Jacobs N. Pro M3E: Probabilistic Masked MultiModal Embedding Model for Ecology. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026, June 3-7, 2026.  

Support for this research was provided by the National Science Foundation (OAC-2232860) and the Taylor Geospatial Institute.


The McKelvey School of Engineering at Washington University in St. Louis promotes independent inquiry and education with an emphasis on scientific excellence, innovation and collaboration without boundaries. McKelvey Engineering has top-ranked research and graduate programs across departments, particularly in biomedical engineering, environmental engineering and computing, and has one of the most selective undergraduate programs in the country. With 165 full-time faculty, 1,524 undergraduate students, 1,554 graduate students and 22,000 living alumni, we are working to solve some of society’s greatest challenges; to prepare students to become leaders and innovate throughout their careers; and to be a catalyst of economic development for the St. Louis region and beyond.

Click on the topics below for more stories in those areas

Back to News