Course
In this tutorial, I’ll walk you through building a medical documentation assistant with Google File Search. You'll see how to set it up, implement queries, and use advanced features like custom chunking and metadata filtering. By the end, you'll understand when managed RAG makes sense versus building your own stack.
What is Google File Search Tool?
Building RAG applications usually means dealing with vector databases, embedding pipelines, and a lot of infrastructure. Google's File Search tool, released in November 2025, eliminates this complexity with a fully managed RAG system built directly into the Gemini API.
The tool handles the complex parts for you: chunking documents, generating embeddings, and managing semantic search without requiring external tools like Pinecone or ChromaDB. The workflow is straightforward—upload files, create a store, and start querying. You also get built-in citations that let you verify where answers come from.
Understanding RAG and Why Google Simplifies It
Gemini File Search markets itself as a managed RAG system. Understanding RAG helps you use the tool well and decide when it fits your use case.
At its core, retrieval-augmented generation (RAG) connects language models to external knowledge. Before generating a response, the model retrieves relevant information from your documents, grounding answers in your actual data instead of relying solely on training data.
The DIY RAG challenge
While RAG sounds straightforward in concept, building a RAG pipeline yourself means managing several components:
- Vector databases: Set up and maintain services like Pinecone, ChromaDB, or Weaviate to store embeddings
- Embedding pipelines: Convert documents to numerical vectors and handle updates when content changes
- Chunking strategies: Split documents into pieces that balance context and retrieval precision
- Infrastructure: Monitor performance, tune parameters, and handle scaling as your data grows
Each component requires expertise and ongoing maintenance. Whether you're building a production system that needs reliability or a prototype that needs speed, the infrastructure overhead remains the same bottleneck.
Why managed RAG matters
Managed services like Google File Search eliminate this bottleneck. Instead of tuning retrieval systems, you write queries. Instead of debugging embedding pipelines, you validate results. The infrastructure runs in the background while you focus on application logic.
Gemini File Search handles the technical complexity while you control what matters: which documents to index, how to query them, and how to use the results. This balance works well when you need production quality without operational overhead. For a deeper background on RAG basics, I recommend checking out DataCamp's tutorial on agentic RAG.

The best way to understand Google File Search is by using it. In the next section, you'll build a complete medical documentation assistant that demonstrates the full workflow from document upload to grounded responses with citations.
Building a Medical Documentation Assistant With Google File Search
Disclaimer: This tutorial demonstrates File Search capabilities using FDA drug labels for educational purposes only. The assistant you'll build is not intended for clinical use, patient care decisions, or medical diagnosis. Always consult qualified healthcare professionals for medical advice. AI systems can generate incorrect information even with grounding in source documents.
This section walks you through building a complete medical documentation assistant using File Search. You'll work with FDA drug labels for three common medications, creating a system that answers questions about drug interactions, side effects, and contraindications. The assistant provides verifiable answers by citing specific passages from the source documents.
File Search operates in two phases: you index your documents once, then query them repeatedly. You'll set up the indexing infrastructure first, then focus entirely on asking questions and interpreting grounded responses.
Step 1: Install the API and configure authentication
You need Python 3.9 or later. Install the Google Generative AI SDK and dependencies:
pip install google-genai python-dotenv
Get your API key from Google AI Studio. Store it in a .env file in your project directory:
GOOGLE_API_KEY=your_api_key_here
Set up your imports and initialize the client:
from google import genai
from google.genai import types
import time
from dotenv import load_dotenv
load_dotenv()
client = genai.Client()
The genai.Client() handles authentication automatically using your environment variable. You'll use this client object for all File Search operations.
Step 2: Create a File Search store
Create a store to hold your indexed documents:
file_search_store = client.file_search_stores.create(
config={"display_name": "fda-drug-labels"}
)
print(f"Created store: {file_search_store.name}")
A File Search store acts as a container for your indexed documents. Unlike temporary file uploads that expire after 48 hours, stores persist indefinitely. This means you index documents once and query them thousands of times without re-uploading or re-processing.
The file_search_store.name contains a unique identifier you'll reference when querying. It looks like fileSearchStores/fdadruglabels-abc123. Save this value if you need to query the store from a different session.
Step 3: Upload and index PDF documents
For this tutorial, you'll work with three FDA-approved drug labels. Download these PDFs from the FDA website:
- Metformin (Glucophage) - Diabetes medication
- Atorvastatin (Lipitor) - Cholesterol medication
- Lisinopril (Zestril) - Blood pressure medication
Save them in your project directory, then upload to your File Search store:
pdf_files = ["metformin.pdf", "atorvastatin.pdf", "lisinopril.pdf"]
for pdf_file in pdf_files:
operation = client.file_search_stores.upload_to_file_search_store(
file=pdf_file,
file_search_store_name=file_search_store.name,
config={"display_name": pdf_file.replace(".pdf", "")},
)
# Wait for indexing to complete
while not operation.done:
time.sleep(3)
operation = client.operations.get(operation)
print(f"{pdf_file} indexed")
During upload, File Search chunks each PDF and converts the segments to embeddings using the gemini-embedding-001 model. These embeddings are numerical representations that capture semantic meaning, letting the system find relevant passages even when your question doesn't match the exact wording in the document.
The polling pattern (while not operation.done) handles the asynchronous nature of indexing. Large documents take longer to process, so the API returns immediately, and you check the completion status periodically. For production systems, consider adding timeout logic to prevent infinite loops.
Each chunk preserves metadata linking it back to its source document and position. This metadata becomes important when you access citations later.
Step 4: Query for single-document information
Now query your indexed documents:
query1 = "What are the contraindications for metformin?"
response1 = client.models.generate_content(
model="gemini-2.5-flash",
contents=query1,
config=types.GenerateContentConfig(
tools=[
types.Tool(
file_search=types.FileSearch(
file_search_store_names=[file_search_store.name]
)
)
]
),
)
print(response1.text)
This prints the generated answer:
Metformin is contraindicated in several conditions:
* Severe renal impairment (eGFR below 30 mL/min/1.73 m2)
* Acute or chronic metabolic acidosis
* Hypersensitivity to metformin
File Search retrieves the most semantically similar chunks in your documents and provides them as context to gemini-2.5-flash, which generates the answer. The tools array configuration tells the model to use File Search during generation. You can combine File Search with other tools like code execution or Google Search in the same request.
Step 5: Access citations and grounding metadata
Extract which documents informed the answer:
print("Sources used:")
for i, chunk in enumerate(response1.candidates[0].grounding_metadata.grounding_chunks, 1):
source_name = chunk.retrieved_context.title
print(f" [{i}] {source_name}")
Output:
Sources used:
[1] metformin
[2] atorvastatin
Each chunk in the grounding metadata includes the source document title and the specific text passage that informed the answer. This creates a verification path from the generated response back to your original documents—necessary for medical, legal, or financial applications where accuracy matters.
The grounding_chunks array contains all retrieved passages, ordered by relevance. Even though the query asks specifically about metformin, File Search also retrieved content from the atorvastatin document, likely because it contains related contraindication information. This demonstrates the semantic retrieval approach: the system finds conceptually related content, not just keyword matches.
Step 6: Query across multiple documents
Test a multi-document drug interaction question:
query2 = "Can a patient take both atorvastatin and metformin together? Are there any drug interactions?"
response2 = client.models.generate_content(
model="gemini-2.5-flash",
contents=query2,
config=types.GenerateContentConfig(
tools=[
types.Tool(
file_search=types.FileSearch(
file_search_store_names=[file_search_store.name]
)
)
]
),
)
print(response2.text)
The same API pattern now pulls from multiple documents and synthesizes information. Access the retrieved text snippets:
print("Sources used:")
for i, chunk in enumerate(response2.candidates[0].grounding_metadata.grounding_chunks, 1):
source_name = chunk.retrieved_context.title
source_text = chunk.retrieved_context.text[:100] + "..."
print(f" [{i}] {source_name}")
print(f" {source_text}")
Output shows excerpts from both drug labels:
Sources used:
[1] atorvastatin
Concomitant use with diabetes medications is generally safe but monitor glucose levels...
[2] metformin
Carbonic anhydrase inhibitors may increase the risk of lactic acidosis...
File Search retrieves relevant sections from both documents, and the model synthesizes them into a coherent answer. The retrieved_context.text attribute gives you the exact passage used, letting you verify the model didn't hallucinate information.
Step 7: Run cross-document comparisons
Ask an analytical question that requires comparing all three documents:
query3 = "Which medications have muscle-related side effects?"
response3 = client.models.generate_content(
model="gemini-2.5-flash",
contents=query3,
config=types.GenerateContentConfig(
tools=[
types.Tool(
file_search=types.FileSearch(
file_search_store_names=[file_search_store.name]
)
)
]
),
)
print(response3.text)
# Check which documents were consulted
metadata = response3.candidates[0].grounding_metadata
for i, chunk in enumerate(metadata.grounding_chunks, 1):
print(f" [{i}] {chunk.retrieved_context.title}")
The output identifies atorvastatin as having muscle-related side effects (myalgia, myopathy, rhabdomyolysis) and confirms that the other medications don't list such effects. The grounding metadata shows that File Search consulted all three documents to answer the comparison question.
You've now built a working medical documentation assistant. The core workflow stays consistent: configure the File Search tool in your generate_content() call, get the response text, and access grounding metadata for verification. The store persists on Google's servers, so you can query it from future sessions without re-indexing.
Next, you'll explore advanced features like custom chunking configurations and metadata filtering that give you finer control over retrieval behavior.
Google File Search Tool Advanced Features and Customization
The basic File Search workflow covers most use cases, but production systems often need finer control over retrieval behavior. This section shows how to customize chunking strategies, filter documents with metadata, optimize performance, and manage multiple stores for different use cases.
Custom chunking configuration
File Search automatically splits documents into chunks during indexing. By default, it uses a chunking strategy optimized for general documents, but you can customize this behavior when specific document types need different handling.
Consider the medical assistant example. Drug labels contain dense technical information in tables and short paragraphs. Smaller chunks let you retrieve precise information like specific dosages or contraindications without pulling in irrelevant context. Larger chunks work better for narrative sections that require more context to understand properly.
Configure chunking parameters when uploading documents:
operation = client.file_search_stores.upload_to_file_search_store(
file="metformin.pdf",
file_search_store_name=file_search_store.name,
config={
"display_name": "metformin",
"chunking_config": {
"white_space_config": {
"max_tokens_per_chunk": 200,
"max_overlap_tokens": 20
}
}
}
)
The chunking_config parameter controls how File Search splits your documents. The max_tokens_per_chunk parameter sets the maximum size of each chunk, while max_overlap_tokens determines how much content overlaps between consecutive chunks. This overlap ensures that information spanning chunk boundaries doesn't get lost during retrieval.
The trade-off matters: shorter chunks give more precise retrieval but may miss broader context, while larger chunks retain more meaning but may include irrelevant information.
For technical documentation with clear section boundaries, use smaller chunks (150-250 tokens). For narrative documents like research papers or reports, larger chunks (400-600 tokens) preserve argument flow and context. The official File Search documentation provides additional guidance on choosing chunk sizes for different document types.
Metadata filtering
When your store contains dozens or hundreds of documents, metadata filtering narrows the retrieval scope before the semantic search runs. This improves precision and reduces processing time.
Add metadata during document upload to enable filtering later:
operation = client.file_search_stores.upload_to_file_search_store(
file="metformin.pdf",
file_search_store_name=file_search_store.name,
config={
"display_name": "metformin",
"custom_metadata": [
{"key": "category", "string_value": "diabetes"},
{"key": "year", "numeric_value": 2017},
{"key": "drug_class", "string_value": "biguanide"}
]
}
)
The custom_metadata parameter accepts an array of key-value pairs. Use string_value for text metadata like categories or drug classes, and numeric_value for years, versions, or other numeric data.
Query with metadata filters to search only relevant documents:
query = "What are the common side effects?"
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=query,
config=types.GenerateContentConfig(
tools=[
types.Tool(
file_search=types.FileSearch(
file_search_store_names=[file_search_store.name],
metadata_filter="category=diabetes"
)
)
]
)
)
The metadata_filter parameter restricts retrieval to documents matching the specified criteria. In this example, File Search only considers documents with category=diabetes, ignoring the blood pressure and cholesterol medications, even though they exist in the same store.
This becomes critical when stores contain heterogeneous documents. A medical knowledge base might include drug labels, research papers, and clinical guidelines. Filtering by document type ensures you get dosage information from labels, not research abstracts.
You can combine metadata filtering with the full semantic search capability. The filter runs first to select candidate documents, then semantic search finds the most relevant passages within those documents.
Performance optimization
File Search performance depends on store size, query complexity, and model choice. Following these guidelines keeps retrieval fast and costs manageable.
Store size limits: Keep individual stores under 20GB for optimal retrieval latency. File Search stores embeddings alongside your documents, and embeddings require approximately three times the size of your original files. A 7GB collection of PDFs generates roughly 21GB of stored data once indexed, exceeding the recommended limit.
When you approach this limit, create separate stores organized by category, time period, or access pattern. For the medical assistant, you might create separate stores for different drug categories rather than indexing every available medication in a single store.
Cost structure: File Search charges $0.15 per 1 million tokens for indexing. Once indexed, you can run thousands of queries without additional indexing costs. This pricing model favors read-heavy workloads where you query the same documents repeatedly.
Model selection: Use gemini-2.5-flash for most queries. It processes requests in 1-2 seconds and costs significantly less than gemini-2.5-pro. Reserve gemini-2.5-pro for queries requiring deep reasoning across multiple sources or handling extremely complex synthesis tasks. The cost difference between models matters more than indexing costs for high-volume applications.
Monitor store size as you add documents. You can check this through the API, though the size calculation happens on Google's backend and may not reflect immediately after uploads. For complete technical specifications and limits, refer to the Gemini File API documentation.
Managing multiple stores
Each Google Cloud project supports up to 10 File Search stores. Multiple stores let you separate documents by access control, performance requirements, or logical organization.
Create specialized stores for different use cases:
# Create separate stores for different document categories
diabetes_store = client.file_search_stores.create(
config={"display_name": "diabetes-medications"}
)
cardio_store = client.file_search_stores.create(
config={"display_name": "cardiovascular-medications"}
)
Query multiple stores in a single request:
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What medications treat both diabetes and heart disease?",
config=types.GenerateContentConfig(
tools=[
types.Tool(
file_search=types.FileSearch(
file_search_store_names=[
diabetes_store.name,
cardio_store.name
]
)
)
]
)
)
File Search retrieves from all specified stores and synthesizes results. The grounding metadata identifies which store each citation came from, maintaining full traceability.
Stores persist indefinitely and require manual deletion when no longer needed, making them suitable for production applications where documents remain queryable across sessions and deployments.
Next, you'll see how File Search compares to other RAG solutions and when to choose managed versus DIY approaches.
Google File Search Tool vs Other File Search and RAG Tools
File Search isn't your only option for building RAG applications. Understanding how it stacks up against alternatives helps you pick the right tool. Let's compare Google's approach with OpenAI's offering and traditional custom builds.
|
Feature |
Google File Search |
OpenAI File Search |
Custom RAG (LangChain) |
|
Pricing Model |
$0.15/M tokens (index only) |
0.10/GB daily storage |
Infrastructure + development costs |
|
Chunking Control |
Automated with basic config |
Configurable (800 tokens default, 400 overlap) |
Full control over strategy |
|
Search Type |
Semantic (vector only) |
Hybrid (vector + keyword) |
Any method you implement |
|
File Formats |
150+ types (PDF, DOCX, code, etc.) |
6 types (TXT, MD, HTML, DOCX, PPTX, PDF) |
Depends on parsers used |
|
Setup Time |
Minutes |
Minutes |
Days to weeks |
|
Citations |
Built-in with grounding metadata |
Built-in |
Must implement yourself |
|
Best For |
High query volume, quick deployment |
Keyword-heavy queries, moderate control |
Complex requirements, full customization |
Google File Search vs OpenAI File Search
Both companies offer hosted RAG, but they take different paths on pricing and capabilities.
Pricing: Google charges you once during indexing ($2.50 per thousand queries plus $0.10 per GB daily for storage). If you're running lots of queries but rarely updating documents, Google's model saves money. If you're constantly re-indexing, the math gets interesting.
Configuration control: Google keeps things simple with automated chunking and limited configuration. OpenAI gives you more control. You can set chunk size (800 tokens default) and overlap (400 tokens). OpenAI also runs hybrid search combining vector and keyword matching, while Google relies purely on semantic search. This matters when your queries contain specific technical terms or product codes.
File formats: Google handles 150+ file types, including code files and various document formats. OpenAI supports six: TXT, MD, HTML, DOCX, PPTX, and PDF. Neither handles structured data like CSV or JSONL well. That's where custom builds shine.
Integration: Google ties into Gemini models and Google Cloud services. OpenAI connects to their model family and Azure. Both give you citations and source tracking.
The real split comes down to simplicity versus control. Google wraps everything into one API call. OpenAI lets you tune retrieval at the cost of more complexity. There's no single winner here. It depends on whether you want speed or customization for your project.
Custom RAG capabilities
Building your own RAG system with tools like LangChain unlocks capabilities that hosted services don't offer. DataCamp's RAG with LangChain course walks through this approach in detail.
Custom builds enable advanced techniques:
- Semantic splitting that detects when topics shift instead of cutting at fixed lengths
- Token-aware chunking that respects model context windows precisely
- Hybrid retrieval mixing BM25 keyword search with dense vectors
- Query transformations like HyDE that generate hypothetical answers to improve search
- Graph RAG representing documents as networks of entities and relationships
DataCamp's tutorial on improving RAG performance explores these techniques with examples showing measurable quality improvements. The trade-off is operational complexity: you're monitoring database performance, tuning embedding models, and handling updates across multiple services.
When to use each approach
Choose hosted tools like File Search when:
- Building prototypes or proofs-of-concept where speed matters
- Your use case fits standard patterns (Q&A over docs, knowledge bases, documentation search)
- Your team lacks deep RAG expertise
- You want predictable costs and minimal ops overhead
Build custom when:
- You need advanced chunking or specialized retrieval methods
- Working with structured data or unusual file formats
- Building agentic RAG systems combining multiple strategies
- Optimizing costs at massive scale justifies engineering investment
- Compliance requires specific infrastructure or models
Most projects start with hosted solutions and switch to custom builds only when requirements demand it. The techniques from custom RAG (smart chunking, hybrid search, query optimization) still inform how you use hosted tools. Understanding the full landscape helps you make better choices as your needs evolve.
Conclusion
You've built a complete RAG system using the Google File Search Tool, from indexing FDA drug labels to querying with citations. The medical assistant demonstrates how managed services handle the infrastructure while you focus on application logic.
File Search works well when you need reliable RAG without managing vector databases or embedding pipelines. The free storage and query embeddings make costs predictable. Persistent stores eliminate re-indexing overhead, letting you scale queries without scaling infrastructure maintenance.
Before deploying to production, add critical safeguards that the tutorial omitted for brevity. Implement error handling with timeouts for upload operations and try-catch blocks around API calls. Consider data privacy implications when uploading documents to Google's servers, especially with sensitive content. Add validation to verify grounding metadata exists before accessing citations. Test thoroughly with domain experts to catch cases where the model generates plausible but incorrect answers despite grounding.
For next steps, try extending your assistant with metadata filtering to organize documents by category. Experiment with chunk sizes to match your document types. The techniques you learned here apply whether you're building support bots, documentation search, or knowledge assistants.
FAQs
Do I need to re-index when a document changes?
Yes. Updating or replacing a file requires re-uploading it so the embeddings reflect the new content.
Can I control access to specific documents?
Not yet at a fine-grained level. You can use metadata filters to limit which documents are queried, but user-level permissions aren’t currently supported.
What are the file and store size limits?
Individual files can be up to about 100 MB, and store tiers range up to roughly 1 TB. Keeping stores under 20 GB generally ensures faster retrieval.
How reliable are the citations?
File Search attaches grounding metadata showing which document chunks informed an answer. These citations improve transparency but should still be reviewed for accuracy.
Does File Search use keyword or vector retrieval?
It relies on semantic (vector-based) search. If you need exact keyword matching, you’ll need a custom or hybrid retrieval setup.

I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn.