Llama 4 RAG pipeline

Setting Up

%%capture
%pip install langchain
%pip install langchain-community 
%pip install langchainhub 
%pip install langchain-chroma 
%pip install langchain-groq
%pip install langchain-huggingface
%pip install unstructured[docx]

Groq Python API

import os
from groq import Groq

groq_api_key = os.environ.get("GROQ_API_KEY")

client = Groq(
   api_key=groq_api_key,
)


chat_streaming = client.chat.completions.create(
    messages=[
       {"role": "system", "content": "You are a professional Data Engineer."},
       {"role": "user", "content": "Can you explain how the data lake works?"},
    ],
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    temperature=0.3,
    max_tokens=1200,
    top_p=1,
    stop=None,
    stream=True,
)

for chunk in chat_streaming:
    print(chunk.choices[0].delta.content, end="")

Initiating LLM and Embedding

from langchain_groq import ChatGroq

llm = ChatGroq(model="meta-llama/llama-4-scout-17b-16e-instruct", api_key=groq_api_key)

from langchain_huggingface import HuggingFaceEmbeddings
embed_model = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

Loading and spliting the data

from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
    separators=["\n\n", "\n"]
)

# Load the .docx files
loader = DirectoryLoader("./", glob="*.docx", use_multithreading=True)
documents = loader.load()

# Split the documents into chunks
chunks = text_splitter.split_documents(documents)

# Print the number of chunks
print(len(chunks))

Creating the Vector Store

from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embed_model,
    persist_directory="./Vectordb",
)

query = "What this tutorial about?"
docs = vectorstore.similarity_search(query)
print(docs[0].page_content)

Creating the RAG pipeline

# Create retriever
retriever = vectorstore.as_retriever()

# Import PromptTemplate
from langchain_core.prompts import PromptTemplate

# Define a clearer, more professional prompt template
template = """You are an expert assistant tasked with answering questions based on the provided documents.
Use only the given context to generate your answer.
If the answer cannot be found in the context, clearly state that you do not know.
Be detailed and precise in your response, but avoid mentioning or referencing the context itself.

Context:
{context}

Question:
{question}

Answer:"""

# Create the PromptTemplate
rag_prompt = PromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

‌
‌
‌

Llama 4 RAG pipeline

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Setting Up

Groq Python API

Initiating LLM and Embedding

Loading and spliting the data

Creating the Vector Store

Creating the RAG pipeline

Setting Up