You're working for a well-known car manufacturer who is looking at implementing LLMs into vehicles to provide guidance to drivers. You've been asked to experiment with integrating car manuals with an LLM to create a context-aware chatbot. They hope that this context-aware LLM can be hooked up to a text-to-speech software to read the model's response aloud.
As a proof of concept, you'll integrate several pages from a car manual that contains car warning messages and their meanings and recommended actions. This particular manual, stored as an HTML file, mg-zs-warning-messages.html, is from an MG ZS automobile, a compact SUV. Armed with your newfound knowledge of LLMs and LangChain, you'll implement Retrieval Augmented Generation (RAG) to create the context-aware chatbot.
Note: Although we'll be using the OpenAI API in this project, you do not need to specify an API key.
# Run this cell to install the necessary packages
import subprocess
import pkg_resources
def install_if_needed(package, version):
'''Function to ensure that the libraries used are consistent to avoid errors.'''
try:
pkg = pkg_resources.get_distribution(package)
if pkg.version != version:
raise pkg_resources.VersionConflict(pkg, version)
except (pkg_resources.DistributionNotFound, pkg_resources.VersionConflict):
subprocess.check_call(["pip", "install", f"{package}=={version}"])
install_if_needed("langchain-core", "0.3.72")
install_if_needed("langchain-openai", "0.3.28")
install_if_needed("langchain-community", "0.3.27")
install_if_needed("unstructured", "0.18.11")
install_if_needed("langchain-chroma", "0.2.5")
install_if_needed("langchain-text-splitters", "0.3.9")# Import the required packages
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import UnstructuredHTMLLoader
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser# Load the HTML as a LangChain document loader
loader = UnstructuredHTMLLoader(file_path="data/mg-zs-warning-messages.html")
car_docs = loader.load()import os
# Load the models required to complete the exercise
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", openai_api_key=os.environ["OPENAI_API_KEY"])Developing RAG
Chunking / Splitting
car_docs[0].page_contenttext_splitter = RecursiveCharacterTextSplitter(
chunk_size=150,
chunk_overlap=50,
)chunks = text_splitter.split_documents(car_docs)
print(f"Number of chunks: {len(chunks)}")
for chunk in chunks:
print(len(chunk.page_content))
print(chunk)
print("=====")Embedding and Storing Chunks
# Create a Chroma vector store and embed the chunks
vectore_store = Chroma.from_documents(
documents=chunks,
embedding=embeddings
)Creating Retrieval Prompt
prompt = """
Use the only the context provided to answer following question. If you don't know the answer, reply that you are unsure.
Context: {context}
Question: {question}
"""
# Convert the string into a chat prompt template
prompt_template = ChatPromptTemplate.from_template(prompt)
print(prompt_template)# Convert the vector store into a retriever
retriever = vectore_store.as_retriever(search_type="similarity", search_kwargs={"k": 2})