Recursive Retrieval for RAG: Implementation With LlamaIndex

Learn how to implement recursive retrieval in RAG systems using LlamaIndex to improve the accuracy and relevance of retrieved information, especially for large document collections.

Nov 13, 2024 · 8 min read

In many RAG applications, the retrieval process is often quite simple. Documents are usually broken into chunks, converted into embeddings, and stored in a vector database. When a query is made, the system pulls the top-k documents based on how similar their embeddings are.

However, this method has some drawbacks, especially with large collections. The chunks can be unclear, and the system might not always pull the most relevant information, leading to less accurate results.

Recursive retrieval was developed to improve retrieval accuracy by using the document’s structure. Instead of directly retrieving chunks, it first retrieves relevant summaries and then drills down to the corresponding chunks, making the final retrieval results more relevant.

In this article, we will explain recursive retrieval and guide you through how to implement it step-by-step using LlamaIndex.

RAG with LangChain

Integrate external data with LLMs using Retrieval Augmented Generation (RAG) and LangChain.

Explore Course

What Is Recursive Retrieval?

Instead of just embedding raw chunks of documents and retrieving them based on similarity, recursive retrieval works by first embedding summaries of the documents and linking them to the full document chunks. When a query is made, the system first retrieves the relevant summaries and then digs deeper to find the related chunks of information.

This method gives the retrieval system more context before providing the final chunks, making it better at finding relevant information.

Recursive Retrieval Implementation Using LlamaIndex

In this section, we will walk you through the step-by-step process of implementing recursive retrieval using LlamaIndex, starting from loading the documents to running queries with recursive retrieval.

Step 1: Load and prepare the documents

First, we load the documents into the system using SimpleDirectoryReader. Each document is given a title and metadata (like its category) to make filtering easier later. The loaded documents are then stored in a dictionary for easy access.

from llama_index.core import SimpleDirectoryReader

# Document titles and metadata
article_titles = ["How to Do Great Work", "Having Kids", "How to Lose Time and Money"]
article_metadatas = {
    "How to Do Great Work": {
        "category": "self-help",
    },
    "Having Kids": {
        "category": "self-help",
    },
    "How to Lose Time and Money": {
        "category": "self-help",
    },
}

# Load documents and update with metadata
docs_dict = {}
for title in article_titles:
    doc = SimpleDirectoryReader(
        input_files=[f"llamaindex-data/{title}.txt"]
    ).load_data()[0]
    doc.metadata.update(article_metadatas[title])
    docs_dict[title] = doc

docs_dict

For the sake of readability, I’ll truncate the output below:

{'How to Do Great Work': Document(id_='e26a2fcc-77d2-43e8-968b-f893944907dc', embedding=None, metadata={'file_path': 'llamaindex-data/How to Do Great Work.txt', 'file_name': 'How to Do Great Work.txt', 'file_type': 'text/plain', 'file_size': 59399, 'creation_date': '2024-09-18', 'last_modified_date': '2024-09-18', 'category': 'self-help'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='July 2023\\n\\nIf you collected lists of techniques for doing great work in a lot of different fields, what would the intersection look like? I decided to find out by making it.\\n\\nPartly my goal was to create a guide that could be used by someone working in any field. But I was also curious about the shape of the intersection. And one thing this exercise shows is that it does have a definite shape; it\\'s not just a point labelled "work hard."\\n\\nThe following recipe assumes you\\'re very ambitious.\\n\\n\\n\\n\\n\\nThe first step is to decide what to work on. The work you choose needs to have three qualities: it has to be something you have a natural aptitude for, that you have a deep interest in, and that offers scope to do great work.\\n\\nIn practice you don\\'t have to worry much about the third criterion. Ambitious people are if anything already too conservative about it. So all you need to do is find something you have an aptitude for and great interest in. [1]\\n\\nThat sounds straightforward, but it\\'s often quite difficult. When you\\'re young you don\\'t know what you\\'re good at or what different kinds of work are like. Some kinds of work you end up doing may not even exist yet. So while some people know what they want to do at 14, most have to figure it out.\\n\\nThe way to figure out what to work on is by working. If you\\'re not sure what to work on, guess. But pick something and get going. You\\'ll probably guess wrong some of the time, but that\\'s fine. It\\'s good to know about multiple things; some of the biggest discoveries come from noticing connections between different fields.\\n\\n
…
(truncated)

Step 2: Set up the LLM and the chunking

Next, we initialize the large language model (LLM) using OpenAI's GPT-4o Mini and configure a sentence splitter to break the documents into smaller chunks for embedding. We also set up a callback manager to track the process.

from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import LlamaDebugHandler, CallbackManager
from llama_index.core.node_parser import SentenceSplitter

# Initialize LLM and chunk splitter
llm = OpenAI("gpt-4o-mini")
callback_manager = CallbackManager([LlamaDebugHandler()])
splitter = SentenceSplitter(chunk_size=256)

Step 3: Build vector indexes and generate summaries

For each document, we create a vector index, which lets us retrieve relevant document chunks later based on similarity. Summaries of each document are generated by the LLM. These summaries are stored as IndexNode.

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.schema import IndexNode

# Define top-level nodes and vector retrievers
nodes = []
vector_query_engines = {}
vector_retrievers = {}

for title in article_titles:
    # build vector index
    vector_index = VectorStoreIndex.from_documents(
        [docs_dict[title]],
        transformations=[splitter],
        callback_manager=callback_manager,
    )
    
    # define query engines
    vector_query_engine = vector_index.as_query_engine(llm=llm)
    vector_query_engines[title] = vector_query_engine
    vector_retrievers[title] = vector_index.as_retriever(similarity_top_k=3)
    # save summaries
    out_path = Path("summaries") / f"{title}.txt"
    if not out_path.exists():
        # use LLM-generated summary
        summary_index = SummaryIndex.from_documents(
            [docs_dict[title]], callback_manager=callback_manager
        )
        summarizer = summary_index.as_query_engine(
            response_mode="tree_summarize", llm=llm
        )
        response = await summarizer.aquery(
            f"Give me a summary of {title}"
        )
        article_summary = response.response
        Path("summaries").mkdir(exist_ok=True)
        with open(out_path, "w") as fp:
            fp.write(article_summary)
    else:
        with open(out_path, "r") as fp:
            article_summary = fp.read()
    print(f"**Summary for {title}: {article_summary}")
    node = IndexNode(text=article_summary, index_id=title)
    nodes.append(node)

**********
Trace: index_construction
**********
**Summary for How to Do Great Work: The essence of doing great work revolves around a few key principles. First, it's crucial to choose a field that aligns with your natural aptitudes and deep interests, as this will drive your motivation and creativity. Engaging in your own projects and maintaining a sense of excited curiosity are vital for discovering new ideas and making significant contributions.
Learning enough to reach the frontiers of knowledge in your chosen field allows you to identify gaps and explore them, often leading to innovative breakthroughs. Hard work is essential, but it should be fueled by genuine interest rather than mere diligence. Consistency and the willingness to embrace challenges, including the risk of failure, are important for growth and discovery.
Collaboration with high-quality colleagues can enhance your work, as they can provide insights and encouragement. Maintaining morale is also crucial; a positive mindset can help you navigate setbacks and keep you focused on your goals.
Ultimately, curiosity serves as the driving force behind great work, guiding you through the process of exploration and discovery. By nurturing your curiosity and being open to new experiences, you can uncover unique opportunities and make meaningful contributions in your field.
**********
Trace: index_construction
**********
**Summary for Having Kids: The piece reflects on the author's transformation in perspective regarding parenthood. Initially apprehensive about having children, viewing parents as uncool and burdensome, the author experiences a profound shift after becoming a parent. The arrival of their first child triggers protective instincts and a newfound appreciation for children, leading to genuine joy in parenting moments that were previously overlooked. 
The author acknowledges the challenges of parenthood, such as reduced productivity and ambition, as well as the necessity of adapting to a child's schedule. Despite these challenges, the author finds that the happiness and meaningful moments shared with children far outweigh the difficulties. The narrative emphasizes that while parenting can be demanding, it also brings unexpected joy and fulfillment, ultimately leading to a richer life experience.
**********
Trace: index_construction
**********
**Summary for How to Lose Time and Money: The piece discusses the author's reflections on wealth and time management after selling a startup. It emphasizes that losing wealth often stems from poor investments rather than excessive spending, as the latter triggers alarms in our minds. The author highlights the need to develop new awareness to avoid bad investments, which can be less obvious than overspending on luxuries. Similarly, when it comes to time, the most significant loss occurs not through leisure activities but through engaging in unproductive work that feels legitimate, like managing emails. The author argues that modern complexities require us to recognize and avoid these deceptive traps that mimic productive behavior but ultimately lead to wasted time.

As you can see, we now have three nodes, each representing a summary of one of the documents. Additionally, we have vector_retrievers, which store the chunk vectors for each document.

print(nodes)
print('------'
print(vector_retrievers)

[IndexNode(id_='406d9927-c9e2-486f-9fc5-111efefc1649', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="The essence of doing great work revolves around a few key principles. First, it's crucial to choose a field that aligns with your natural aptitudes and deep interests, as this will drive your motivation and creativity. Engaging in your own projects and maintaining a sense of excited curiosity are vital for discovering new ideas and making significant contributions.\\n\\nLearning enough to reach the frontiers of knowledge in your chosen field allows you to identify gaps and explore them, often leading to innovative breakthroughs. Hard work is essential, but it should be fueled by genuine interest rather than mere diligence. Consistency and the willingness to embrace challenges, including the risk of failure, are important for growth and discovery.\\n\\nCollaboration with high-quality colleagues can enhance your work, as they can provide insights and encouragement. Maintaining morale is also crucial; a positive mindset can help you navigate setbacks and keep you focused on your goals.\\n\\nUltimately, curiosity serves as the driving force behind great work, guiding you through the process of exploration and discovery. By nurturing your curiosity and being open to new experiences, you can uncover unique opportunities and make meaningful contributions in your field.", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n', index_id='How to Do Great Work', obj=None),
 IndexNode(id_='8007fdd2-6617-4a76-95d7-79efef0700e7', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="The piece reflects on the author's transformation in perspective regarding parenthood. Initially apprehensive about having children, viewing parents as uncool and burdensome, the author experiences a profound shift after becoming a parent. The arrival of their first child triggers protective instincts and a newfound appreciation for children, leading to genuine joy in parenting moments that were previously overlooked. \\n\\nThe author acknowledges the challenges of parenthood, such as reduced productivity and ambition, as well as the necessity of adapting to a child's schedule. Despite these challenges, the author finds that the happiness and meaningful moments shared with children far outweigh the difficulties. The narrative emphasizes that while parenting can be demanding, it also brings unexpected joy and fulfillment, ultimately leading to a richer life experience.", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n', index_id='Having Kids', obj=None),
 IndexNode(id_='7e4dd169-eb28-4b2f-8a1a-ca1c5b85ac30', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="The piece discusses the author's reflections on wealth and time management after selling a startup. It emphasizes that losing wealth often stems from poor investments rather than excessive spending, as the latter triggers alarms in our minds. The author highlights the need to develop new awareness to avoid bad investments, which can be less obvious than overspending on luxuries. Similarly, when it comes to time, the most significant loss occurs not through leisure activities but through engaging in unproductive work that feels legitimate, like managing emails. The author argues that modern complexities require us to recognize and avoid these deceptive traps that mimic productive behavior but ultimately lead to wasted time.", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n', index_id='How to Lose Time and Money', obj=None)]
 ------
 {'How to Do Great Work': <llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x330afeeb0>,
 'Having Kids': <llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x33129c7c0>,
 'How to Lose Time and Money': <llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x32e8929a0>}

Step 4: Build a top-level vector index

Once we have the summaries (nodes), we can create a top-level vector index and retriever (top_vector_retriever). This index uses the summaries to start the retrieval process. It helps us find the most relevant summaries before we look into the detailed document chunks.

# Build top-level vector index from summary nodes
top_vector_index = VectorStoreIndex(
    nodes, transformations=[splitter], callback_manager=callback_manager
)

# Set up a retriever for the top-level summaries
top_vector_retriever = top_vector_index.as_retriever(similarity_top_k=1)
top_vector_retriever

<llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x32db715b0>

Step 5: Set up recursive retrieval

Now that we have the top-level retriever as well as individual document retrievers, we can set up the recursive retriever. This setup allows the system to first get the relevant summaries and then dive into the specific document chunks based on their relevance.

from llama_index.core.retrievers import RecursiveRetriever

# Combine top-level retriever with individual document retrievers
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": top_vector_retriever, **vector_retrievers},
    verbose=True,
)

Step 6: Run recursive retrieval queries

Finally, we are ready to use our recursive retriever to run some sample queries.

# Run recursive retriever on sample queries
result = recursive_retriever.retrieve("should I have kids?")
for res in result:
    print(res.node.get_content())

Retrieving with query id None: should I have kids?
Retrieved node with id, entering: Having Kids
Retrieving with query id Having Kids: should I have kids?
Retrieving text node: Do you have so little to spare?
And while having kids may be warping my present judgement, it hasn't overwritten my memory. I remember perfectly well what life was like before. Well enough to miss some things a lot, like the ability to take off for some other country at a moment's notice. That was so great. Why did I never do that?
See what I did there? The fact is, most of the freedom I had before kids, I never used. I paid for it in loneliness, but I never used it.
I had plenty of happy times before I had kids. But if I count up happy moments, not just potential happiness but actual happy moments, there are more after kids than before. Now I practically have it on tap, almost any bedtime.
People's experiences as parents vary a lot, and I know I've been lucky. But I think the worries I had before having kids must be pretty common, and judging by other parents' faces when they see their kids, so must the happiness that kids bring.
Retrieving text node: December 2019
Before I had kids, I was afraid of having kids. Up to that point I felt about kids the way the young Augustine felt about living virtuously. I'd have been sad to think I'd never have children. But did I want them now? No.
If I had kids, I'd become a parent, and parents, as I'd known since I was a kid, were uncool. They were dull and responsible and had no fun. And while it's not surprising that kids would believe that, to be honest I hadn't seen much as an adult to change my mind. Whenever I'd noticed parents with kids, the kids seemed to be terrors, and the parents pathetic harried creatures, even when they prevailed.
When people had babies, I congratulated them enthusiastically, because that seemed to be what one did. But I didn't feel it at all. "Better you than me," I was thinking.
Now when people have babies I congratulate them enthusiastically and I mean it. Especially the first one. I feel like they just got the best gift in the world.
Retrieving text node: Which meant I had to finish or I'd be taking away their trip to Africa. Maybe if I'm really lucky such tricks could put me net ahead. But the wind is there, no question.
On the other hand, what kind of wimpy ambition do you have if it won't survive having kids? Do you have so little to spare?
And while having kids may be warping my present judgement, it hasn't overwritten my memory. I remember perfectly well what life was like before. Well enough to miss some things a lot, like the ability to take off for some other country at a moment's notice. That was so great. Why did I never do that?
See what I did there? The fact is, most of the freedom I had before kids, I never used. I paid for it in loneliness, but I never used it.
I had plenty of happy times before I had kids. But if I count up happy moments, not just potential happiness but actual happy moments, there are more after kids than before. Now I practically have it on tap, almost any bedtime.

result = recursive_retriever.retrieve("How to buy more time?")
for res in result:
    print(res.node.get_content())

Retrieving with query id None: How to buy more time?
Retrieved node with id, entering: How to Lose Time and Money
Retrieving with query id How to Lose Time and Money: How to buy more time?
Retrieving text node: Which is why people trying to sell you expensive things say "it's an investment."
The solution is to develop new alarms. This can be a tricky business, because while the alarms that prevent you from overspending are so basic that they may even be in our DNA, the ones that prevent you from making bad investments have to be learned, and are sometimes fairly counterintuitive.
A few days ago I realized something surprising: the situation with time is much the same as with money. The most dangerous way to lose time is not to spend it having fun, but to spend it doing fake work. When you spend time having fun, you know you're being self-indulgent. Alarms start to go off fairly quickly. If I woke up one morning and sat down on the sofa and watched TV all day, I'd feel like something was terribly wrong. Just thinking about it makes me wince. I'd start to feel uncomfortable after sitting on a sofa watching TV for 2 hours, let alone a whole day.
Retrieving text node: The solution is to develop new alarms. This can be a tricky business, because while the alarms that prevent you from overspending are so basic that they may even be in our DNA, the ones that prevent you from making bad investments have to be learned, and are sometimes fairly counterintuitive.
A few days ago I realized something surprising: the situation with time is much the same as with money. The most dangerous way to lose time is not to spend it having fun, but to spend it doing fake work. When you spend time having fun, you know you're being self-indulgent. Alarms start to go off fairly quickly. If I woke up one morning and sat down on the sofa and watched TV all day, I'd feel like something was terribly wrong. Just thinking about it makes me wince. I'd start to feel uncomfortable after sitting on a sofa watching TV for 2 hours, let alone a whole day.
And yet I've definitely had days when I might as well have sat in front of a TV all day — days at the end of which, if I asked myself what I got done that day, the answer would have been: basically, nothing.
Retrieving text node: Investing bypasses those alarms. You're not spending the money; you're just moving it from one asset to another. Which is why people trying to sell you expensive things say "it's an investment."
The solution is to develop new alarms. This can be a tricky business, because while the alarms that prevent you from overspending are so basic that they may even be in our DNA, the ones that prevent you from making bad investments have to be learned, and are sometimes fairly counterintuitive.
A few days ago I realized something surprising: the situation with time is much the same as with money. The most dangerous way to lose time is not to spend it having fun, but to spend it doing fake work. When you spend time having fun, you know you're being self-indulgent. Alarms start to go off fairly quickly. If I woke up one morning and sat down on the sofa and watched TV all day, I'd feel like something was terribly wrong. Just thinking about it makes me wince.

Conclusion

By using document summaries and hierarchies, recursive retrieval makes the retrieved chunks more relevant, even when dealing with large data sets. For organizations handling large volumes of data, recursive retrieval is a reliable method to create more accurate retrieval systems.

To learn more about RAG techniques, I recommend these blogs:

Project: Building RAG Chatbots for Technical Documentation

Implement RAG with LangChain to create a chatbot for answering questions about technical documentation.

Explore Project

Author

Ryan Ong

Topics

Artificial Intelligence

Large Language Models

Learn AI with these courses!

Track

Developing AI Applications

0 min

Learn to create AI-powered applications with the latest AI developer tools, including the OpenAI API, Hugging Face, and LangChain.

See Details

Start Course

Course

Vector Databases for Embeddings with Pinecone

3 hr

6.1K

Discover how the Pinecone vector database is revolutionizing AI application development!

See Details

Start Course

Course

Retrieval Augmented Generation (RAG) with LangChain

3 hr

12.4K

Learn cutting-edge methods for integrating external data with LLMs using Retrieval Augmented Generation (RAG) with LangChain.

See Details

Start Course

blog

Advanced RAG Techniques

Learn advanced RAG methods like dense retrieval, reranking, or multi-step reasoning to tackle issues like hallucination or ambiguity.

Stanislav Karzhev

12 min

blog

What is Retrieval Augmented Generation (RAG)?

Learn how Retrieval Augmented Generation (RAG) enhances large language models by integrating external data sources.

Natassha Selvaraj

6 min

Tutorial

Boost LLM Accuracy with Retrieval Augmented Generation (RAG) and Reranking

Discover the strengths of LLMs with effective information retrieval mechanisms. Implement a reranking approach and incorporate it into your own LLM pipeline.

Iván Palomares Carrascosa

Tutorial

RAG With Llama 3.1 8B, Ollama, and Langchain: Tutorial

Learn to build a RAG application with Llama 3.1 8B using Ollama and Langchain by setting up the environment, processing documents, creating embeddings, and integrating a retriever.

Ryan Ong

Tutorial

Corrective RAG (CRAG) Implementation With LangGraph

Corrective RAG (CRAG) is a RAG technique that incorporates self-assessment of retrieved documents to improve the accuracy and relevance of generated responses.

Ryan Ong

code-along

Retrieval Augmented Generation with LlamaIndex

In this session you'll learn how to get started with Chroma and perform Q&A on some documents using Llama 2, the RAG technique, and LlamaIndex.

Dan Becker

See More See More

RAG with LangChain

What Is Recursive Retrieval?

Recursive Retrieval Implementation Using LlamaIndex

Step 1: Load and prepare the documents

Step 2: Set up the LLM and the chunking

Step 3: Build vector indexes and generate summaries

Step 4: Build a top-level vector index

Step 5: Set up recursive retrieval

Step 6: Run recursive retrieval queries

Conclusion

Project: Building RAG Chatbots for Technical Documentation

Advanced RAG Techniques

What is Retrieval Augmented Generation (RAG)?

Boost LLM Accuracy with Retrieval Augmented Generation (RAG) and Reranking

RAG With Llama 3.1 8B, Ollama, and Langchain: Tutorial

Corrective RAG (CRAG) Implementation With LangGraph

Retrieval Augmented Generation with LlamaIndex

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Developing AI Applications

Vector Databases for Embeddings with Pinecone

Retrieval Augmented Generation (RAG) with LangChain

Advanced RAG Techniques

What is Retrieval Augmented Generation (RAG)?

Boost LLM Accuracy with Retrieval Augmented Generation (RAG) and Reranking

RAG With Llama 3.1 8B, Ollama, and Langchain: Tutorial

Corrective RAG (CRAG) Implementation With LangGraph

Retrieval Augmented Generation with LlamaIndex

Developing AI Applications