Skip to content

Building Multimodal AI Applications with LangChain & the OpenAI API

Goals

Videos can be full of useful information, but getting hold of that info can be slow, since you need to watch the whole thing or try skipping through it. It can be much faster to use a bot to ask questions about the contents of the transcript.

Download a video from Internet Archive, transcribe the audio, and create a simple Q&A bot to ask questions about the content.

  • Understanding the building blocks of working with Multimodal AI projects
  • Working with some of the fundamental concepts of LangChain
  • How to use the Whisper API to transcribe audio to text
  • How to combine both LangChain and Whisper API to create ask questions of any YouTube video

Setup

The project requires several packages that need to be installed into Workspace.

  • langchain is a framework for developing generative AI applications.
  • yt_dlp lets you download YouTube videos.
  • tiktoken converts text into tokens.
  • docarray makes it easier to work with multi-model data (in this case mixing audio and text).
# Install the openai package, locked to version 1.27
!pip install openai==1.27

# Install the langchain package, locked to version 0.1.19
!pip install langchain==0.1.19

# Install the langchain-openai package, locked to version 0.1.6
!pip install langchain-openai==0.1.6

# Install the yt_dlp package, locked to version 2024.4.9
!pip install yt_dlp==2024.4.9

# Install the tiktoken package, locked to version 0.6.0
!pip install tiktoken==0.6.0

# Install the docarray package, locked to version 0.40.0
!pip install docarray==0.40.0

#install pydub package
!pip install pydub 
Hidden output

Import The Required Libraries

Import the following packages.

  • Import os.
  • Import glob.
  • Import openai.
  • Import yt_dlp with the alias youtube_dl.
  • From the yt_dlp package, import DowloadError.
  • From the pydub package, import AudioSegment.
  • Assign openai_api_key to os.getenv("OPENAI_API_KEY").
# Import the os package
import os

# Import the glob package
import glob

# Import the openai package 
import openai

# Import the yt_dlp package as youtube_dl
import yt_dlp as youtube_dl

# Import DownloadError from yt_dlp
from yt_dlp import DownloadError 

# Import DocArray 
import docarray 

# Import AudioSegment
from pydub import AudioSegment
openai_api_key = os.getenv("OPENAI_API_KEY")

Download the Video



# Target video URL
video_url = "https://archive.org/details/0769_So_Youre_Going_to_High_School_18_16_46_00"

# Output directory
output_dir = "files/audio/"
os.makedirs(output_dir, exist_ok=True)

# yt-dlp config for MP3 extraction
ydl_opts = {
    "format": "bestaudio/best",
    "postprocessors": [
        {
            "key": "FFmpegExtractAudio",
            "preferredcodec": "mp3",
            "preferredquality": "192",
        }
    ],
    "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
    "quiet": False,  # Show progress
    "noplaylist": True,
    "force_generic_extractor": True,  # <-- Add this line to force generic extractor
}

# Run downloader
try:
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([video_url])
except:
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([video_url])

print("Audio extraction complete!")

Find the audio file in the output directory.

  • Find all the MP3 audio files in the output directory by joining the output directory to the pattern *.mp3 and using glob to list them.
  • Select the last file in the list and assign it to audio_filename.
  • Print audio_filename.
# Find the audio file in the output directory

# Find all the audio files in the output directory
audio_file = glob.glob(os.path.join(output_dir, '*.mp3'))

# Select the last audio file in the list
audio_filename = audio_file[-1]

# Print the name of the selected audio file
print(audio_filename)