Accéder au contenu principal

Building Multimodal AI Applications Using MongoDB and Voyage AI

Apoorva Joshi, Senior AI Developer Advocate at MongoDB, teaches you how to build a simple multimodal AI application using MongoDB and Voyage AI.
23 juil. 2025
Code along with us onCode Along

View slides

As AI applications expand beyond text, the ability to work with image, video, and other modalities is becoming a must-have skill. Building effective multimodal systems requires not only the right models, but also the right infrastructure to store, retrieve, and serve diverse data types at scale.

In this code-along, Apoorva Joshi, a Senior AI Developer Advocate at MongoDB, will teach you how to build a simple multimodal AI application using MongoDB and Voyage AI. You’ll learn how to structure and query image and video data, apply retrieval techniques with Voyage AI, and connect everything in a functional pipeline. This session is ideal for data scientists and AI engineers looking to expand their application-building toolkit.

Key Takeaways:

  • Learn how to search and retrieve multimodal content using Voyage AI.
  • Understand how to use MongoDB to store and serve data for AI applications.
  • Build a simple, functional multimodal AI app from scratch.

Session Resources + Slides

Sujets
Apparenté

blog

What is Multimodal AI?

Discover multimodal AI, one of the most promising trends in generative AI.
Javier Canales Luna's photo

Javier Canales Luna

8 min

podcast

No More NoSQL? How AI is Changing the Database with Sahir Azam, Chief Product Officer at MongoDB

Richie and Sahir explore the evolution of databases beyond NoSQL, enhancing developer productivity, integrating AI capabilities, modernizing legacy systems, and much more.

podcast

Building Multi-Modal AI Applications with Russ d'Sa, CEO & Co-founder of LiveKit

Richie and Russ explore the evolution of voice AI, the challenges of building voice apps, the rise of video AI, the implications of deep fakes, the future of AI in customer service and education, and much more.

Didacticiel

Building Multimodal AI Application with Gemini 2.0 Pro

Build a chat app that can understand text, images, audio, and documents, as well as execute Python code. Truly a multimodal application closer to AGI.
Abid Ali Awan's photo

Abid Ali Awan

Didacticiel

Lovable AI: A Guide With Demo Project

Learn how to build and publish a mobile app using Lovable AI, integrating it with Supabase for backend services and GitHub for version control.
François Aubry's photo

François Aubry

code-along

Building Multimodal AI Applications with LangChain & the OpenAI API

Combine the power of text and audio AI models to build a bot that answers questions about YouTube videos.
Korey Stegared-Pace's photo

Korey Stegared-Pace

Voir plusVoir plus