Building Multimodal AI Applications with LangChain & the OpenAI API

Goals

Videos can be full of useful information, but getting hold of that info can be slow, since you need to watch the whole thing or try skipping through it. It can be much faster to use a bot to ask questions about the contents of the transcript.

In this project, you'll download a tutorial video from YouTube, transcribe the audio, and create a simple Q&A bot to ask questions about the content.

Maintenance note, May 2024

Since this code-along was released, the Python packages for working with the OpenAI API have changed their syntax. The instructions, hints, and code have been updated to use the latest syntax, but the video has not been updated. Consequently, it is now slightly out of sync. Trust the workbook, not the video.

Understanding the building blocks of working with Multimodal AI projects
Working with some of the fundamental concepts of LangChain
How to use the Whisper API to transcribe audio to text
How to combine both LangChain and Whisper API to create ask questions of any YouTube video

Before you begin

You'll need a developer account with OpenAI and a create API Key. The API secret key will be stored in your 'Environment Variables' on the side menu. See the getting-started.ipynb notebook for details on setting this up.

Task 0: Setup

The project requires several packages that need to be installed into Workspace.

langchain is a framework for developing generative AI applications.
yt_dlp lets you download YouTube videos.
tiktoken converts text into tokens.
docarray makes it easier to work with multi-model data (in this case mixing audio and text).

Instructions

Run the following code to install the packages.

import os

openai_api_key = os.environ["OPENAI_API_KEY"]

# Install the openai package, locked to version 1.27
!pip install openai==1.27

# Install the langchain package, locked to version 0.1.19
!pip install langchain==0.1.19

# Install the langchain-openai package, locked to version 0.1.6
!pip install langchain-openai==0.1.6

# Install the yt_dlp package, locked to version 2024.4.9
!pip install yt_dlp==2024.4.9

# Install the tiktoken package, locked to version 0.6.0
!pip install tiktoken==0.6.0

# Install the docarray package, locked to version 0.40.0
!pip install docarray==0.40.0

Hidden output

Instructions

Task 1: Import The Required Libraries

For this project we need the os and the yt_dlp packages to download the YouTube video of your choosing, convert it to an .mp3 and save the file. We will also be using the openai package to make easy calls to the OpenAI models we will use.

‌
‌
‌

Building Multimodal AI Applications with LangChain & the OpenAI API

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Building Multimodal AI Applications with LangChain & the OpenAI API

Goals

Maintenance note, May 2024

Before you begin

Task 0: Setup

Instructions

Instructions

Task 1: Import The Required Libraries

Building Multimodal AI Applications with LangChain & the OpenAI API