Google File Search Tool Tutorial: Build RAG Applications With Gemini API

Learn how to build a RAG app with Google File Search and Gemini API. Step-by-step guide with code, chunking, metadata filtering, and citations

Nov 25, 2025 · 14 min read

In this tutorial, I’ll walk you through building a medical documentation assistant with Google File Search. You'll see how to set it up, implement queries, and use advanced features like custom chunking and metadata filtering. By the end, you'll understand when managed RAG makes sense versus building your own stack.

What is Google File Search Tool?

Building RAG applications usually means dealing with vector databases, embedding pipelines, and a lot of infrastructure. Google's File Search tool, released in November 2025, eliminates this complexity with a fully managed RAG system built directly into the Gemini API.

The tool handles the complex parts for you: chunking documents, generating embeddings, and managing semantic search without requiring external tools like Pinecone or ChromaDB. The workflow is straightforward—upload files, create a store, and start querying. You also get built-in citations that let you verify where answers come from.

Understanding RAG and Why Google Simplifies It

Gemini File Search markets itself as a managed RAG system. Understanding RAG helps you use the tool well and decide when it fits your use case.

At its core, retrieval-augmented generation (RAG) connects language models to external knowledge. Before generating a response, the model retrieves relevant information from your documents, grounding answers in your actual data instead of relying solely on training data.

The DIY RAG challenge

While RAG sounds straightforward in concept, building a RAG pipeline yourself means managing several components:

Vector databases: Set up and maintain services like Pinecone, ChromaDB, or Weaviate to store embeddings
Embedding pipelines: Convert documents to numerical vectors and handle updates when content changes
Chunking strategies: Split documents into pieces that balance context and retrieval precision
Infrastructure: Monitor performance, tune parameters, and handle scaling as your data grows

Each component requires expertise and ongoing maintenance. Whether you're building a production system that needs reliability or a prototype that needs speed, the infrastructure overhead remains the same bottleneck.

Why managed RAG matters

Managed services like Google File Search eliminate this bottleneck. Instead of tuning retrieval systems, you write queries. Instead of debugging embedding pipelines, you validate results. The infrastructure runs in the background while you focus on application logic.

Gemini File Search handles the technical complexity while you control what matters: which documents to index, how to query them, and how to use the results. This balance works well when you need production quality without operational overhead. For a deeper background on RAG basics, I recommend checking out DataCamp's tutorial on agentic RAG.

The best way to understand Google File Search is by using it. In the next section, you'll build a complete medical documentation assistant that demonstrates the full workflow from document upload to grounded responses with citations.

Building a Medical Documentation Assistant With Google File Search

Disclaimer: This tutorial demonstrates File Search capabilities using FDA drug labels for educational purposes only. The assistant you'll build is not intended for clinical use, patient care decisions, or medical diagnosis. Always consult qualified healthcare professionals for medical advice. AI systems can generate incorrect information even with grounding in source documents.

This section walks you through building a complete medical documentation assistant using File Search. You'll work with FDA drug labels for three common medications, creating a system that answers questions about drug interactions, side effects, and contraindications. The assistant provides verifiable answers by citing specific passages from the source documents.

File Search operates in two phases: you index your documents once, then query them repeatedly. You'll set up the indexing infrastructure first, then focus entirely on asking questions and interpreting grounded responses.

Step 1: Install the API and configure authentication

You need Python 3.9 or later. Install the Google Generative AI SDK and dependencies:

pip install google-genai python-dotenv

Get your API key from Google AI Studio. Store it in a .env file in your project directory:

GOOGLE_API_KEY=your_api_key_here

Set up your imports and initialize the client:

from google import genai
from google.genai import types
import time
from dotenv import load_dotenv

load_dotenv()
client = genai.Client()

The genai.Client() handles authentication automatically using your environment variable. You'll use this client object for all File Search operations.

Step 2: Create a File Search store

Create a store to hold your indexed documents:

file_search_store = client.file_search_stores.create(
    config={"display_name": "fda-drug-labels"}
)
print(f"Created store: {file_search_store.name}")

A File Search store acts as a container for your indexed documents. Unlike temporary file uploads that expire after 48 hours, stores persist indefinitely. This means you index documents once and query them thousands of times without re-uploading or re-processing.

The file_search_store.name contains a unique identifier you'll reference when querying. It looks like fileSearchStores/fdadruglabels-abc123. Save this value if you need to query the store from a different session.

Step 3: Upload and index PDF documents

For this tutorial, you'll work with three FDA-approved drug labels. Download these PDFs from the FDA website:

Metformin (Glucophage) - Diabetes medication
Atorvastatin (Lipitor) - Cholesterol medication
Lisinopril (Zestril) - Blood pressure medication

Save them in your project directory, then upload to your File Search store:

pdf_files = ["metformin.pdf", "atorvastatin.pdf", "lisinopril.pdf"]

for pdf_file in pdf_files:
    operation = client.file_search_stores.upload_to_file_search_store(
        file=pdf_file,
        file_search_store_name=file_search_store.name,
        config={"display_name": pdf_file.replace(".pdf", "")},
    )

    # Wait for indexing to complete
    while not operation.done:
        time.sleep(3)
        operation = client.operations.get(operation)

    print(f"{pdf_file} indexed")

During upload, File Search chunks each PDF and converts the segments to embeddings using the gemini-embedding-001 model. These embeddings are numerical representations that capture semantic meaning, letting the system find relevant passages even when your question doesn't match the exact wording in the document.

The polling pattern (while not operation.done) handles the asynchronous nature of indexing. Large documents take longer to process, so the API returns immediately, and you check the completion status periodically. For production systems, consider adding timeout logic to prevent infinite loops.

Each chunk preserves metadata linking it back to its source document and position. This metadata becomes important when you access citations later.

Step 4: Query for single-document information

Now query your indexed documents:

query1 = "What are the contraindications for metformin?"

response1 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=query1,
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[file_search_store.name]
                )
            )
        ]
    ),
)

print(response1.text)

This prints the generated answer:

Metformin is contraindicated in several conditions:

* Severe renal impairment (eGFR below 30 mL/min/1.73 m2)
* Acute or chronic metabolic acidosis
* Hypersensitivity to metformin

File Search retrieves the most semantically similar chunks in your documents and provides them as context to gemini-2.5-flash, which generates the answer. The tools array configuration tells the model to use File Search during generation. You can combine File Search with other tools like code execution or Google Search in the same request.

Step 5: Access citations and grounding metadata

Extract which documents informed the answer:

print("Sources used:")
for i, chunk in enumerate(response1.candidates[0].grounding_metadata.grounding_chunks, 1):
    source_name = chunk.retrieved_context.title
    print(f"  [{i}] {source_name}")

Output:

Sources used:
  [1] metformin
  [2] atorvastatin

Each chunk in the grounding metadata includes the source document title and the specific text passage that informed the answer. This creates a verification path from the generated response back to your original documents—necessary for medical, legal, or financial applications where accuracy matters.

The grounding_chunks array contains all retrieved passages, ordered by relevance. Even though the query asks specifically about metformin, File Search also retrieved content from the atorvastatin document, likely because it contains related contraindication information. This demonstrates the semantic retrieval approach: the system finds conceptually related content, not just keyword matches.

Step 6: Query across multiple documents

Test a multi-document drug interaction question:

query2 = "Can a patient take both atorvastatin and metformin together? Are there any drug interactions?"

response2 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=query2,
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[file_search_store.name]
                )
            )
        ]
    ),
)

print(response2.text)

The same API pattern now pulls from multiple documents and synthesizes information. Access the retrieved text snippets:

print("Sources used:")

for i, chunk in enumerate(response2.candidates[0].grounding_metadata.grounding_chunks, 1):
    source_name = chunk.retrieved_context.title
    source_text = chunk.retrieved_context.text[:100] + "..."
    print(f"  [{i}] {source_name}")
    print(f"      {source_text}")

Output shows excerpts from both drug labels:

Sources used:
  [1] atorvastatin
      Concomitant use with diabetes medications is generally safe but monitor glucose levels...
  [2] metformin
      Carbonic anhydrase inhibitors may increase the risk of lactic acidosis...

File Search retrieves relevant sections from both documents, and the model synthesizes them into a coherent answer. The retrieved_context.text attribute gives you the exact passage used, letting you verify the model didn't hallucinate information.

Step 7: Run cross-document comparisons

Ask an analytical question that requires comparing all three documents:

query3 = "Which medications have muscle-related side effects?"

response3 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=query3,
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[file_search_store.name]
                )
            )
        ]
    ),
)

print(response3.text)

# Check which documents were consulted
metadata = response3.candidates[0].grounding_metadata
for i, chunk in enumerate(metadata.grounding_chunks, 1):
    print(f"  [{i}] {chunk.retrieved_context.title}")

The output identifies atorvastatin as having muscle-related side effects (myalgia, myopathy, rhabdomyolysis) and confirms that the other medications don't list such effects. The grounding metadata shows that File Search consulted all three documents to answer the comparison question.

You've now built a working medical documentation assistant. The core workflow stays consistent: configure the File Search tool in your generate_content() call, get the response text, and access grounding metadata for verification. The store persists on Google's servers, so you can query it from future sessions without re-indexing.

Next, you'll explore advanced features like custom chunking configurations and metadata filtering that give you finer control over retrieval behavior.

Google File Search Tool Advanced Features and Customization

The basic File Search workflow covers most use cases, but production systems often need finer control over retrieval behavior. This section shows how to customize chunking strategies, filter documents with metadata, optimize performance, and manage multiple stores for different use cases.

Custom chunking configuration

File Search automatically splits documents into chunks during indexing. By default, it uses a chunking strategy optimized for general documents, but you can customize this behavior when specific document types need different handling.

Consider the medical assistant example. Drug labels contain dense technical information in tables and short paragraphs. Smaller chunks let you retrieve precise information like specific dosages or contraindications without pulling in irrelevant context. Larger chunks work better for narrative sections that require more context to understand properly.

Configure chunking parameters when uploading documents:

operation = client.file_search_stores.upload_to_file_search_store(
    file="metformin.pdf",
    file_search_store_name=file_search_store.name,
    config={
        "display_name": "metformin",
        "chunking_config": {
            "white_space_config": {
                "max_tokens_per_chunk": 200,
                "max_overlap_tokens": 20
            }
        }
    }
)

The chunking_config parameter controls how File Search splits your documents. The max_tokens_per_chunk parameter sets the maximum size of each chunk, while max_overlap_tokens determines how much content overlaps between consecutive chunks. This overlap ensures that information spanning chunk boundaries doesn't get lost during retrieval.

The trade-off matters: shorter chunks give more precise retrieval but may miss broader context, while larger chunks retain more meaning but may include irrelevant information.

For technical documentation with clear section boundaries, use smaller chunks (150-250 tokens). For narrative documents like research papers or reports, larger chunks (400-600 tokens) preserve argument flow and context. The official File Search documentation provides additional guidance on choosing chunk sizes for different document types.

Metadata filtering

When your store contains dozens or hundreds of documents, metadata filtering narrows the retrieval scope before the semantic search runs. This improves precision and reduces processing time.

Add metadata during document upload to enable filtering later:

operation = client.file_search_stores.upload_to_file_search_store(
    file="metformin.pdf",
    file_search_store_name=file_search_store.name,
    config={
        "display_name": "metformin",
        "custom_metadata": [
            {"key": "category", "string_value": "diabetes"},
            {"key": "year", "numeric_value": 2017},
            {"key": "drug_class", "string_value": "biguanide"}
        ]
    }
)

The custom_metadata parameter accepts an array of key-value pairs. Use string_value for text metadata like categories or drug classes, and numeric_value for years, versions, or other numeric data.

Query with metadata filters to search only relevant documents:

query = "What are the common side effects?"

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=query,
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[file_search_store.name],
                    metadata_filter="category=diabetes"
                )
            )
        ]
    )
)

The metadata_filter parameter restricts retrieval to documents matching the specified criteria. In this example, File Search only considers documents with category=diabetes, ignoring the blood pressure and cholesterol medications, even though they exist in the same store.

This becomes critical when stores contain heterogeneous documents. A medical knowledge base might include drug labels, research papers, and clinical guidelines. Filtering by document type ensures you get dosage information from labels, not research abstracts.

You can combine metadata filtering with the full semantic search capability. The filter runs first to select candidate documents, then semantic search finds the most relevant passages within those documents.

Performance optimization

File Search performance depends on store size, query complexity, and model choice. Following these guidelines keeps retrieval fast and costs manageable.

Store size limits: Keep individual stores under 20GB for optimal retrieval latency. File Search stores embeddings alongside your documents, and embeddings require approximately three times the size of your original files. A 7GB collection of PDFs generates roughly 21GB of stored data once indexed, exceeding the recommended limit.

When you approach this limit, create separate stores organized by category, time period, or access pattern. For the medical assistant, you might create separate stores for different drug categories rather than indexing every available medication in a single store.

Cost structure: File Search charges $0.15 per 1 million tokens for indexing. Once indexed, you can run thousands of queries without additional indexing costs. This pricing model favors read-heavy workloads where you query the same documents repeatedly.

Model selection: Use gemini-2.5-flash for most queries. It processes requests in 1-2 seconds and costs significantly less than gemini-2.5-pro. Reserve gemini-2.5-pro for queries requiring deep reasoning across multiple sources or handling extremely complex synthesis tasks. The cost difference between models matters more than indexing costs for high-volume applications.

Monitor store size as you add documents. You can check this through the API, though the size calculation happens on Google's backend and may not reflect immediately after uploads. For complete technical specifications and limits, refer to the Gemini File API documentation.

Managing multiple stores

Each Google Cloud project supports up to 10 File Search stores. Multiple stores let you separate documents by access control, performance requirements, or logical organization.

Create specialized stores for different use cases:

# Create separate stores for different document categories
diabetes_store = client.file_search_stores.create(
    config={"display_name": "diabetes-medications"}
)

cardio_store = client.file_search_stores.create(
    config={"display_name": "cardiovascular-medications"}
)

Query multiple stores in a single request:

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What medications treat both diabetes and heart disease?",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[
                        diabetes_store.name,
                        cardio_store.name
                    ]
                )
            )
        ]
    )
)

File Search retrieves from all specified stores and synthesizes results. The grounding metadata identifies which store each citation came from, maintaining full traceability.

Stores persist indefinitely and require manual deletion when no longer needed, making them suitable for production applications where documents remain queryable across sessions and deployments.

Next, you'll see how File Search compares to other RAG solutions and when to choose managed versus DIY approaches.

Google File Search Tool vs Other File Search and RAG Tools

File Search isn't your only option for building RAG applications. Understanding how it stacks up against alternatives helps you pick the right tool. Let's compare Google's approach with OpenAI's offering and traditional custom builds.

Feature	Google File Search	OpenAI File Search	Custom RAG (LangChain)
Pricing Model	$0.15/M tokens (index only)	0.10/GB daily storage	Infrastructure + development costs
Chunking Control	Automated with basic config	Configurable (800 tokens default, 400 overlap)	Full control over strategy
Search Type	Semantic (vector only)	Hybrid (vector + keyword)	Any method you implement
File Formats	150+ types (PDF, DOCX, code, etc.)	6 types (TXT, MD, HTML, DOCX, PPTX, PDF)	Depends on parsers used
Setup Time	Minutes	Minutes	Days to weeks
Citations	Built-in with grounding metadata	Built-in	Must implement yourself
Best For	High query volume, quick deployment	Keyword-heavy queries, moderate control	Complex requirements, full customization

Google File Search vs OpenAI File Search

Both companies offer hosted RAG, but they take different paths on pricing and capabilities.

Pricing: Google charges you once during indexing ($2.50 per thousand queries plus $0.10 per GB daily for storage). If you're running lots of queries but rarely updating documents, Google's model saves money. If you're constantly re-indexing, the math gets interesting.

Configuration control: Google keeps things simple with automated chunking and limited configuration. OpenAI gives you more control. You can set chunk size (800 tokens default) and overlap (400 tokens). OpenAI also runs hybrid search combining vector and keyword matching, while Google relies purely on semantic search. This matters when your queries contain specific technical terms or product codes.

File formats: Google handles 150+ file types, including code files and various document formats. OpenAI supports six: TXT, MD, HTML, DOCX, PPTX, and PDF. Neither handles structured data like CSV or JSONL well. That's where custom builds shine.

Integration: Google ties into Gemini models and Google Cloud services. OpenAI connects to their model family and Azure. Both give you citations and source tracking.

The real split comes down to simplicity versus control. Google wraps everything into one API call. OpenAI lets you tune retrieval at the cost of more complexity. There's no single winner here. It depends on whether you want speed or customization for your project.

Custom RAG capabilities

Building your own RAG system with tools like LangChain unlocks capabilities that hosted services don't offer. DataCamp's RAG with LangChain course walks through this approach in detail.

Custom builds enable advanced techniques:

Semantic splitting that detects when topics shift instead of cutting at fixed lengths
Token-aware chunking that respects model context windows precisely
Hybrid retrieval mixing BM25 keyword search with dense vectors
Query transformations like HyDE that generate hypothetical answers to improve search
Graph RAG representing documents as networks of entities and relationships

DataCamp's tutorial on improving RAG performance explores these techniques with examples showing measurable quality improvements. The trade-off is operational complexity: you're monitoring database performance, tuning embedding models, and handling updates across multiple services.

When to use each approach

Choose hosted tools like File Search when:

Building prototypes or proofs-of-concept where speed matters
Your use case fits standard patterns (Q&A over docs, knowledge bases, documentation search)
Your team lacks deep RAG expertise
You want predictable costs and minimal ops overhead

Build custom when:

You need advanced chunking or specialized retrieval methods
Working with structured data or unusual file formats
Building agentic RAG systems combining multiple strategies
Optimizing costs at massive scale justifies engineering investment
Compliance requires specific infrastructure or models

Most projects start with hosted solutions and switch to custom builds only when requirements demand it. The techniques from custom RAG (smart chunking, hybrid search, query optimization) still inform how you use hosted tools. Understanding the full landscape helps you make better choices as your needs evolve.

Conclusion

You've built a complete RAG system using the Google File Search Tool, from indexing FDA drug labels to querying with citations. The medical assistant demonstrates how managed services handle the infrastructure while you focus on application logic.

File Search works well when you need reliable RAG without managing vector databases or embedding pipelines. The free storage and query embeddings make costs predictable. Persistent stores eliminate re-indexing overhead, letting you scale queries without scaling infrastructure maintenance.

Before deploying to production, add critical safeguards that the tutorial omitted for brevity. Implement error handling with timeouts for upload operations and try-catch blocks around API calls. Consider data privacy implications when uploading documents to Google's servers, especially with sensitive content. Add validation to verify grounding metadata exists before accessing citations. Test thoroughly with domain experts to catch cases where the model generates plausible but incorrect answers despite grounding.

For next steps, try extending your assistant with metadata filtering to organize documents by category. Experiment with chunk sizes to match your document types. The techniques you learned here apply whether you're building support bots, documentation search, or knowledge assistants.

Do I need to re-index when a document changes?

Can I control access to specific documents?

What are the file and store size limits?

How reliable are the citations?

Does File Search use keyword or vector retrieval?

Author

Bex Tuychiev

Topics

Artificial Intelligence

Large Language Models

Top DataCamp Courses

Course

Retrieval Augmented Generation (RAG) with LangChain

3 hr

11.9K

Learn cutting-edge methods for integrating external data with LLMs using Retrieval Augmented Generation (RAG) with LangChain.

See Details

Start Course

Course

Graph RAG with LangChain and Neo4j

3 hr

815

Create more accurate and reliable RAG systems with Graph RAG and hybrid RAG.

See Details

Start Course

Course

End-to-End RAG with Weaviate

2 hr

307

Master RAG with Weaviate! Embed text and images for retrieval, and experiment with vector, BM25, and hybrid search.

See Details

Start Course

Tutorial

Gemini 2.0 Flash: How to Process Large Documents Without RAG

Learn how to use Gemini 2.0 Flash's massive context window to build a SaaS sales insights tool that answers business queries without needing RAG.

Aashi Dutt

Tutorial

Google Antigravity Tutorial: Build a Finance Risk Dashboard

Discover how Google Antigravity turns prompts into full apps. Build a finance dashboard with Gemini 3 and AI-driven browser testing.

Aashi Dutt

Tutorial

Agentic RAG: Step-by-Step Tutorial With Demo Project

Learn how to build an Agentic RAG pipeline from scratch, integrating local data sources and web scraping to generate context-aware responses to user queries.

Bhavishya Pandit

Tutorial

Llama 4 With RAG: A Guide With Demo Project

Learn how to build a retrieval-augmented generation (RAG) pipeline using Llama 4 to create a simple web application.

Abid Ali Awan

Tutorial

Gemini 2.5 Pro API: A Guide With Demo Project

Learn how to use the Gemini 2.5 Pro API to build a web app for code analysis, taking advantage of the model's large context window.

Abid Ali Awan

Tutorial

Gemini 3 API Tutorial: Automating Data Analysis With Gemini 3 Pro and LangGraph

Build a multi‑agent workflow powered by Gemini 3 API to take a dataset, analyze it, generate insights, and produce a complete PDF report automatically.

Abid Ali Awan

See More See More

What is Google File Search Tool?

Understanding RAG and Why Google Simplifies It

The DIY RAG challenge

Why managed RAG matters

Building a Medical Documentation Assistant With Google File Search

Step 1: Install the API and configure authentication

Step 2: Create a File Search store

Step 3: Upload and index PDF documents

Step 4: Query for single-document information

Step 5: Access citations and grounding metadata

Step 6: Query across multiple documents

Step 7: Run cross-document comparisons

Google File Search Tool Advanced Features and Customization

Custom chunking configuration

Metadata filtering

Performance optimization

Managing multiple stores

Google File Search Tool vs Other File Search and RAG Tools

Google File Search vs OpenAI File Search

Custom RAG capabilities

When to use each approach

Conclusion

FAQs

What are the file and store size limits?

How reliable are the citations?

Does File Search use keyword or vector retrieval?

Gemini 2.0 Flash: How to Process Large Documents Without RAG

Google Antigravity Tutorial: Build a Finance Risk Dashboard

Agentic RAG: Step-by-Step Tutorial With Demo Project

Llama 4 With RAG: A Guide With Demo Project

Gemini 2.5 Pro API: A Guide With Demo Project

Gemini 3 API Tutorial: Automating Data Analysis With Gemini 3 Pro and LangGraph

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Retrieval Augmented Generation (RAG) with LangChain

Graph RAG with LangChain and Neo4j

End-to-End RAG with Weaviate

Gemini 2.0 Flash: How to Process Large Documents Without RAG

Google Antigravity Tutorial: Build a Finance Risk Dashboard

Agentic RAG: Step-by-Step Tutorial With Demo Project

Llama 4 With RAG: A Guide With Demo Project

Gemini 2.5 Pro API: A Guide With Demo Project

Gemini 3 API Tutorial: Automating Data Analysis With Gemini 3 Pro and LangGraph

Retrieval Augmented Generation (RAG) with LangChain