Pular para o conteúdo principal

Recuperação recursiva para RAG: Implementação com o LlamaIndex

Saiba como implementar a recuperação recursiva em sistemas RAG usando o LlamaIndex para melhorar a precisão e a relevância das informações recuperadas, especialmente para grandes coleções de documentos.
Actualizado 13 de nov. de 2024  · 8 min de leitura

Em muitos aplicativos aplicativos RAGo processo de recuperação costuma ser bastante simples. Normalmente, os documentos são divididos em partes, convertidos em embeddings e armazenados em um banco de dados vetorial. Quando uma consulta é feita, o sistema extrai os principais documentos com base na semelhança de seus embeddings são semelhantes.

No entanto, esse método tem algumas desvantagens, especialmente em coleções grandes. Os blocos podem não ser claros, e o sistema pode nem sempre obter as informações mais relevantes, o que leva a resultados menos precisos.

A recuperação recursiva foi desenvolvida para melhorar a precisão da recuperação usando a estrutura do documento. Em vez de recuperar diretamente os blocos, ele primeiro recupera os resumos relevantes e, em seguida, analisa os blocos correspondentes, tornando os resultados finais da recuperação mais relevantes.

Neste artigo, explicaremos a recuperação recursiva e orientaremos você sobre como implementá-la passo a passo usando o LlamaIndex.

O que é recuperação recursiva?

Em vez de apenas incorporar blocos brutos de documentos e recuperá-los com base na similaridade, a recuperação recursiva funciona incorporando primeiro resumos dos documentos e vinculando-os aos blocos completos de documentos. Quando uma consulta é feita, o sistema primeiro recupera os resumos relevantes e, em seguida, aprofunda-se para encontrar os blocos de informações relacionados.

Esse método dá ao sistema de recuperação mais contexto antes de fornecer os blocos finais, tornando-o melhor para encontrar informações relevantes.

Implementação de recuperação recursiva usando o LlamaIndex

Nesta seção, mostraremos a você o processo passo a passo de implementação da recuperação recursiva usando o LlamaIndex, desde o carregamento dos documentos até a execução de consultas com recuperação recursiva.

Etapa 1: Carregar e preparar os documentos

Primeiro, carregamos os documentos no sistema usando SimpleDirectoryReader. Cada documento recebe um título e metadados (como sua categoria) para facilitar a filtragem posteriormente. Os documentos carregados são então armazenados em um dicionário para facilitar o acesso.

from llama_index.core import SimpleDirectoryReader

# Document titles and metadata
article_titles = ["How to Do Great Work", "Having Kids", "How to Lose Time and Money"]
article_metadatas = {
    "How to Do Great Work": {
        "category": "self-help",
    },
    "Having Kids": {
        "category": "self-help",
    },
    "How to Lose Time and Money": {
        "category": "self-help",
    },
}

# Load documents and update with metadata
docs_dict = {}
for title in article_titles:
    doc = SimpleDirectoryReader(
        input_files=[f"llamaindex-data/{title}.txt"]
    ).load_data()[0]
    doc.metadata.update(article_metadatas[title])
    docs_dict[title] = doc

docs_dict

Para facilitar a leitura, truncarei o resultado abaixo:

{'How to Do Great Work': Document(id_='e26a2fcc-77d2-43e8-968b-f893944907dc', embedding=None, metadata={'file_path': 'llamaindex-data/How to Do Great Work.txt', 'file_name': 'How to Do Great Work.txt', 'file_type': 'text/plain', 'file_size': 59399, 'creation_date': '2024-09-18', 'last_modified_date': '2024-09-18', 'category': 'self-help'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='July 2023\\n\\nIf you collected lists of techniques for doing great work in a lot of different fields, what would the intersection look like? I decided to find out by making it.\\n\\nPartly my goal was to create a guide that could be used by someone working in any field. But I was also curious about the shape of the intersection. And one thing this exercise shows is that it does have a definite shape; it\\'s not just a point labelled "work hard."\\n\\nThe following recipe assumes you\\'re very ambitious.\\n\\n\\n\\n\\n\\nThe first step is to decide what to work on. The work you choose needs to have three qualities: it has to be something you have a natural aptitude for, that you have a deep interest in, and that offers scope to do great work.\\n\\nIn practice you don\\'t have to worry much about the third criterion. Ambitious people are if anything already too conservative about it. So all you need to do is find something you have an aptitude for and great interest in. [1]\\n\\nThat sounds straightforward, but it\\'s often quite difficult. When you\\'re young you don\\'t know what you\\'re good at or what different kinds of work are like. Some kinds of work you end up doing may not even exist yet. So while some people know what they want to do at 14, most have to figure it out.\\n\\nThe way to figure out what to work on is by working. If you\\'re not sure what to work on, guess. But pick something and get going. You\\'ll probably guess wrong some of the time, but that\\'s fine. It\\'s good to know about multiple things; some of the biggest discoveries come from noticing connections between different fields.\\n\\n
…
(truncated)

Etapa 2: Configurar o LLM e a divisão em blocos

Em seguida, inicializamos o modelo de linguagem grande (LLM) usando o OpenAI's GPT-4o Mini da OpenAI e configuramos um divisor de frases para dividir os documentos em partes menores para incorporação. Também configuramos um gerenciador de retorno de chamada para acompanhar o processo.

from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import LlamaDebugHandler, CallbackManager
from llama_index.core.node_parser import SentenceSplitter

# Initialize LLM and chunk splitter
llm = OpenAI("gpt-4o-mini")
callback_manager = CallbackManager([LlamaDebugHandler()])
splitter = SentenceSplitter(chunk_size=256)

Etapa 3: Criar índices de vetores e gerar resumos

Para cada documento, criamos um índice vetorial que nos permite recuperar partes de documentos relevantes posteriormente com base na similaridade. Os resumos de cada documento são gerados pelo LLM. Esses resumos são armazenados como IndexNode.

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.schema import IndexNode

# Define top-level nodes and vector retrievers
nodes = []
vector_query_engines = {}
vector_retrievers = {}

for title in article_titles:
    # build vector index
    vector_index = VectorStoreIndex.from_documents(
        [docs_dict[title]],
        transformations=[splitter],
        callback_manager=callback_manager,
    )
    
    # define query engines
    vector_query_engine = vector_index.as_query_engine(llm=llm)
    vector_query_engines[title] = vector_query_engine
    vector_retrievers[title] = vector_index.as_retriever(similarity_top_k=3)
    # save summaries
    out_path = Path("summaries") / f"{title}.txt"
    if not out_path.exists():
        # use LLM-generated summary
        summary_index = SummaryIndex.from_documents(
            [docs_dict[title]], callback_manager=callback_manager
        )
        summarizer = summary_index.as_query_engine(
            response_mode="tree_summarize", llm=llm
        )
        response = await summarizer.aquery(
            f"Give me a summary of {title}"
        )
        article_summary = response.response
        Path("summaries").mkdir(exist_ok=True)
        with open(out_path, "w") as fp:
            fp.write(article_summary)
    else:
        with open(out_path, "r") as fp:
            article_summary = fp.read()
    print(f"**Summary for {title}: {article_summary}")
    node = IndexNode(text=article_summary, index_id=title)
    nodes.append(node)
**********
Trace: index_construction
**********
**Summary for How to Do Great Work: The essence of doing great work revolves around a few key principles. First, it's crucial to choose a field that aligns with your natural aptitudes and deep interests, as this will drive your motivation and creativity. Engaging in your own projects and maintaining a sense of excited curiosity are vital for discovering new ideas and making significant contributions.
Learning enough to reach the frontiers of knowledge in your chosen field allows you to identify gaps and explore them, often leading to innovative breakthroughs. Hard work is essential, but it should be fueled by genuine interest rather than mere diligence. Consistency and the willingness to embrace challenges, including the risk of failure, are important for growth and discovery.
Collaboration with high-quality colleagues can enhance your work, as they can provide insights and encouragement. Maintaining morale is also crucial; a positive mindset can help you navigate setbacks and keep you focused on your goals.
Ultimately, curiosity serves as the driving force behind great work, guiding you through the process of exploration and discovery. By nurturing your curiosity and being open to new experiences, you can uncover unique opportunities and make meaningful contributions in your field.
**********
Trace: index_construction
**********
**Summary for Having Kids: The piece reflects on the author's transformation in perspective regarding parenthood. Initially apprehensive about having children, viewing parents as uncool and burdensome, the author experiences a profound shift after becoming a parent. The arrival of their first child triggers protective instincts and a newfound appreciation for children, leading to genuine joy in parenting moments that were previously overlooked. 
The author acknowledges the challenges of parenthood, such as reduced productivity and ambition, as well as the necessity of adapting to a child's schedule. Despite these challenges, the author finds that the happiness and meaningful moments shared with children far outweigh the difficulties. The narrative emphasizes that while parenting can be demanding, it also brings unexpected joy and fulfillment, ultimately leading to a richer life experience.
**********
Trace: index_construction
**********
**Summary for How to Lose Time and Money: The piece discusses the author's reflections on wealth and time management after selling a startup. It emphasizes that losing wealth often stems from poor investments rather than excessive spending, as the latter triggers alarms in our minds. The author highlights the need to develop new awareness to avoid bad investments, which can be less obvious than overspending on luxuries. Similarly, when it comes to time, the most significant loss occurs not through leisure activities but through engaging in unproductive work that feels legitimate, like managing emails. The author argues that modern complexities require us to recognize and avoid these deceptive traps that mimic productive behavior but ultimately lead to wasted time.

Como você pode ver, agora temos três nós, cada um representando um resumo de um dos documentos. Além disso, temos vector_retrievers, que armazena os vetores de partes de cada documento.

print(nodes)
print('------'
print(vector_retrievers)
[IndexNode(id_='406d9927-c9e2-486f-9fc5-111efefc1649', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="The essence of doing great work revolves around a few key principles. First, it's crucial to choose a field that aligns with your natural aptitudes and deep interests, as this will drive your motivation and creativity. Engaging in your own projects and maintaining a sense of excited curiosity are vital for discovering new ideas and making significant contributions.\\n\\nLearning enough to reach the frontiers of knowledge in your chosen field allows you to identify gaps and explore them, often leading to innovative breakthroughs. Hard work is essential, but it should be fueled by genuine interest rather than mere diligence. Consistency and the willingness to embrace challenges, including the risk of failure, are important for growth and discovery.\\n\\nCollaboration with high-quality colleagues can enhance your work, as they can provide insights and encouragement. Maintaining morale is also crucial; a positive mindset can help you navigate setbacks and keep you focused on your goals.\\n\\nUltimately, curiosity serves as the driving force behind great work, guiding you through the process of exploration and discovery. By nurturing your curiosity and being open to new experiences, you can uncover unique opportunities and make meaningful contributions in your field.", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n', index_id='How to Do Great Work', obj=None),
 IndexNode(id_='8007fdd2-6617-4a76-95d7-79efef0700e7', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="The piece reflects on the author's transformation in perspective regarding parenthood. Initially apprehensive about having children, viewing parents as uncool and burdensome, the author experiences a profound shift after becoming a parent. The arrival of their first child triggers protective instincts and a newfound appreciation for children, leading to genuine joy in parenting moments that were previously overlooked. \\n\\nThe author acknowledges the challenges of parenthood, such as reduced productivity and ambition, as well as the necessity of adapting to a child's schedule. Despite these challenges, the author finds that the happiness and meaningful moments shared with children far outweigh the difficulties. The narrative emphasizes that while parenting can be demanding, it also brings unexpected joy and fulfillment, ultimately leading to a richer life experience.", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n', index_id='Having Kids', obj=None),
 IndexNode(id_='7e4dd169-eb28-4b2f-8a1a-ca1c5b85ac30', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="The piece discusses the author's reflections on wealth and time management after selling a startup. It emphasizes that losing wealth often stems from poor investments rather than excessive spending, as the latter triggers alarms in our minds. The author highlights the need to develop new awareness to avoid bad investments, which can be less obvious than overspending on luxuries. Similarly, when it comes to time, the most significant loss occurs not through leisure activities but through engaging in unproductive work that feels legitimate, like managing emails. The author argues that modern complexities require us to recognize and avoid these deceptive traps that mimic productive behavior but ultimately lead to wasted time.", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n', index_id='How to Lose Time and Money', obj=None)]
 ------
 {'How to Do Great Work': <llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x330afeeb0>,
 'Having Kids': <llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x33129c7c0>,
 'How to Lose Time and Money': <llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x32e8929a0>}

Etapa 4: Criar um índice vetorial de nível superior

Quando tivermos os resumos (nodes), poderemos criar um índice vetorial de nível superior e um recuperador (top_vector_retriever). Esse índice usa os resumos para iniciar o processo de recuperação. Isso nos ajuda a encontrar os resumos mais relevantes antes de analisarmos os trechos detalhados dos documentos.

# Build top-level vector index from summary nodes
top_vector_index = VectorStoreIndex(
    nodes, transformations=[splitter], callback_manager=callback_manager
)

# Set up a retriever for the top-level summaries
top_vector_retriever = top_vector_index.as_retriever(similarity_top_k=1)
top_vector_retriever
<llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x32db715b0>

Etapa 5: Configurar a recuperação recursiva

Agora que temos o recuperador de nível superior e os recuperadores de documentos individuais, podemos configurar o recuperador recursivo. Essa configuração permite que o sistema obtenha primeiro os resumos relevantes e, em seguida, analise os blocos de documentos específicos com base em sua relevância.

from llama_index.core.retrievers import RecursiveRetriever

# Combine top-level retriever with individual document retrievers
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": top_vector_retriever, **vector_retrievers},
    verbose=True,
)

Etapa 6: Executar consultas de recuperação recursivas

Por fim, estamos prontos para usar nosso recuperador recursivo para executar algumas consultas de amostra.

# Run recursive retriever on sample queries
result = recursive_retriever.retrieve("should I have kids?")
for res in result:
    print(res.node.get_content())
Retrieving with query id None: should I have kids?
Retrieved node with id, entering: Having Kids
Retrieving with query id Having Kids: should I have kids?
Retrieving text node: Do you have so little to spare?
And while having kids may be warping my present judgement, it hasn't overwritten my memory. I remember perfectly well what life was like before. Well enough to miss some things a lot, like the ability to take off for some other country at a moment's notice. That was so great. Why did I never do that?
See what I did there? The fact is, most of the freedom I had before kids, I never used. I paid for it in loneliness, but I never used it.
I had plenty of happy times before I had kids. But if I count up happy moments, not just potential happiness but actual happy moments, there are more after kids than before. Now I practically have it on tap, almost any bedtime.
People's experiences as parents vary a lot, and I know I've been lucky. But I think the worries I had before having kids must be pretty common, and judging by other parents' faces when they see their kids, so must the happiness that kids bring.
Retrieving text node: December 2019
Before I had kids, I was afraid of having kids. Up to that point I felt about kids the way the young Augustine felt about living virtuously. I'd have been sad to think I'd never have children. But did I want them now? No.
If I had kids, I'd become a parent, and parents, as I'd known since I was a kid, were uncool. They were dull and responsible and had no fun. And while it's not surprising that kids would believe that, to be honest I hadn't seen much as an adult to change my mind. Whenever I'd noticed parents with kids, the kids seemed to be terrors, and the parents pathetic harried creatures, even when they prevailed.
When people had babies, I congratulated them enthusiastically, because that seemed to be what one did. But I didn't feel it at all. "Better you than me," I was thinking.
Now when people have babies I congratulate them enthusiastically and I mean it. Especially the first one. I feel like they just got the best gift in the world.
Retrieving text node: Which meant I had to finish or I'd be taking away their trip to Africa. Maybe if I'm really lucky such tricks could put me net ahead. But the wind is there, no question.
On the other hand, what kind of wimpy ambition do you have if it won't survive having kids? Do you have so little to spare?
And while having kids may be warping my present judgement, it hasn't overwritten my memory. I remember perfectly well what life was like before. Well enough to miss some things a lot, like the ability to take off for some other country at a moment's notice. That was so great. Why did I never do that?
See what I did there? The fact is, most of the freedom I had before kids, I never used. I paid for it in loneliness, but I never used it.
I had plenty of happy times before I had kids. But if I count up happy moments, not just potential happiness but actual happy moments, there are more after kids than before. Now I practically have it on tap, almost any bedtime.
result = recursive_retriever.retrieve("How to buy more time?")
for res in result:
    print(res.node.get_content())
Retrieving with query id None: How to buy more time?
Retrieved node with id, entering: How to Lose Time and Money
Retrieving with query id How to Lose Time and Money: How to buy more time?
Retrieving text node: Which is why people trying to sell you expensive things say "it's an investment."
The solution is to develop new alarms. This can be a tricky business, because while the alarms that prevent you from overspending are so basic that they may even be in our DNA, the ones that prevent you from making bad investments have to be learned, and are sometimes fairly counterintuitive.
A few days ago I realized something surprising: the situation with time is much the same as with money. The most dangerous way to lose time is not to spend it having fun, but to spend it doing fake work. When you spend time having fun, you know you're being self-indulgent. Alarms start to go off fairly quickly. If I woke up one morning and sat down on the sofa and watched TV all day, I'd feel like something was terribly wrong. Just thinking about it makes me wince. I'd start to feel uncomfortable after sitting on a sofa watching TV for 2 hours, let alone a whole day.
Retrieving text node: The solution is to develop new alarms. This can be a tricky business, because while the alarms that prevent you from overspending are so basic that they may even be in our DNA, the ones that prevent you from making bad investments have to be learned, and are sometimes fairly counterintuitive.
A few days ago I realized something surprising: the situation with time is much the same as with money. The most dangerous way to lose time is not to spend it having fun, but to spend it doing fake work. When you spend time having fun, you know you're being self-indulgent. Alarms start to go off fairly quickly. If I woke up one morning and sat down on the sofa and watched TV all day, I'd feel like something was terribly wrong. Just thinking about it makes me wince. I'd start to feel uncomfortable after sitting on a sofa watching TV for 2 hours, let alone a whole day.
And yet I've definitely had days when I might as well have sat in front of a TV all day — days at the end of which, if I asked myself what I got done that day, the answer would have been: basically, nothing.
Retrieving text node: Investing bypasses those alarms. You're not spending the money; you're just moving it from one asset to another. Which is why people trying to sell you expensive things say "it's an investment."
The solution is to develop new alarms. This can be a tricky business, because while the alarms that prevent you from overspending are so basic that they may even be in our DNA, the ones that prevent you from making bad investments have to be learned, and are sometimes fairly counterintuitive.
A few days ago I realized something surprising: the situation with time is much the same as with money. The most dangerous way to lose time is not to spend it having fun, but to spend it doing fake work. When you spend time having fun, you know you're being self-indulgent. Alarms start to go off fairly quickly. If I woke up one morning and sat down on the sofa and watched TV all day, I'd feel like something was terribly wrong. Just thinking about it makes me wince.

Conclusão

Ao usar resumos e hierarquias de documentos, a recuperação recursiva torna os blocos recuperados mais relevantes, mesmo quando você lida com grandes conjuntos de dados. Para organizações que lidam com grandes volumes de dados, a recuperação recursiva é um método confiável para criar sistemas de recuperação mais precisos.

Para saber mais sobre as técnicas de RAG, recomendo estes blogs:


Photo of Ryan Ong
Author
Ryan Ong
LinkedIn
Twitter

Ryan é um cientista de dados líder, especializado na criação de aplicativos de IA usando LLMs. Ele é candidato a PhD em Processamento de Linguagem Natural e Gráficos de Conhecimento no Imperial College London, onde também concluiu seu mestrado em Ciência da Computação. Fora da ciência de dados, ele escreve um boletim informativo semanal da Substack, The Limitless Playbook, no qual compartilha uma ideia prática dos principais pensadores do mundo e, ocasionalmente, escreve sobre os principais conceitos de IA.

Temas

Aprenda IA com estes cursos!

curso

Retrieval Augmented Generation (RAG) with LangChain

3 hr
824
Learn cutting-edge methods for integrating external data with LLMs using Retrieval Augmented Generation (RAG) with LangChain.
Ver DetalhesRight Arrow
Iniciar Curso
Ver maisRight Arrow
Relacionado

blog

O que é Retrieval Augmented Generation (RAG)?

Explorar a Geração Aumentada de Recuperação (RAG) RAG: Integração de LLMs com pesquisa de dados para respostas de IA diferenciadas. Compreender suas aplicações e seu impacto.
Natassha Selvaraj's photo

Natassha Selvaraj

8 min

blog

Avaliação do LLM: Métricas, metodologias, práticas recomendadas

Saiba como avaliar modelos de linguagem grandes (LLMs) usando métricas importantes, metodologias e práticas recomendadas para tomar decisões informadas.
Stanislav Karzhev's photo

Stanislav Karzhev

9 min

tutorial

RAG With Llama 3.1 8B, Ollama e Langchain: Tutorial

Aprenda a criar um aplicativo RAG com o Llama 3.1 8B usando Ollama e Langchain, configurando o ambiente, processando documentos, criando embeddings e integrando um retriever.
Ryan Ong's photo

Ryan Ong

12 min

tutorial

Ajuste fino do Llama 3.1 para classificação de textos

Comece a usar os novos modelos Llama e personalize o Llama-3.1-8B-It para prever vários distúrbios de saúde mental a partir do texto.
Abid Ali Awan's photo

Abid Ali Awan

13 min

tutorial

Guia para iniciantes do LlaMA-Factory WebUI: Ajuste fino dos LLMs

Saiba como fazer o ajuste fino dos LLMs em conjuntos de dados personalizados, avaliar o desempenho e exportar e servir modelos com facilidade usando a estrutura com pouco ou nenhum código do LLaMA-Factory.
Abid Ali Awan's photo

Abid Ali Awan

12 min

tutorial

Llama.cpp Tutorial: Um guia completo para inferência e implementação eficientes de LLM

Este guia abrangente sobre o Llama.cpp o guiará pelos fundamentos da configuração do seu ambiente de desenvolvimento, compreendendo suas principais funcionalidades e aproveitando seus recursos para resolver casos de uso reais.
Zoumana Keita 's photo

Zoumana Keita

11 min

See MoreSee More