Course
Sometimes, simple full-text search or just vector search alone isn’t enough to properly query on a database and receive the results you’re looking for. The combination of both is fantastic for when a developer is dealing with large amounts of multimodal, unstructured data that would benefit from both search types. This is known as hybrid search, and it offers developers a fantastic solution to a difficult challenge.
What Exactly is Hybrid Search?
To properly understand hybrid search, we need to first understand what full-text search is.
Full-text search is a way of searching that matches literal terms from your query against your documents. This type of traditional search is actually what many developers are very familiar with.
For example, if you search for “cute cafe with outdoor seating,” your search engine will look for those exact words inside the database. To put it simply, full-text search is incredibly precise and efficient, but doesn’t work well if you’re hoping to achieve the same results when searching for synonyms, paraphrasing, or even if you have a typo in your query.
Vector search, on the other hand, converts all data to numbers, or embeddings. So, instead of matching exact words, vector search actually compares the semantic meaning of your query with the documents stored in your database.
Searching for “cute cafe with outdoor seating” may bring up “pastries and coffee outside,” even if they don’t use the exact same words. Vector search is not only semantic; it’s also highly flexible, but can sometimes return results that are too broad based on the specified query.
So, where does hybrid search come into play? Well, it combines both full-text search and vector search. This means that developers can leverage not only the semantic intelligence of vectors but also retain the very precise filtering features of full-text search. So, it truly is the best of both worlds. This is super useful for developers when working with large unstructured datasets.
Why Hybrid Search Matters
Hybrid search is beneficial in a bunch of various real-world applications, including (but not limited to) e-commerce, healthcare, and even recruitment.
With e-commerce, imagine accessing your favorite website and searching for “comfortable desk chair under $100.” With vector search, you’ll be able to see semantically similar items (such as ergonomic office chairs), while full-text search focuses on the word “chair” and helps enforce the price point.
With recruitment, if a recruiter is searching through resumes for “engineer with NLP experience,” hybrid search is able to capture both “natural language processing” and the exact keyword “engineer.”
This means that you’re able to receive more relevant and reliable results than using either approach by itself.
Hybrid Search in MongoDB
Let’s go over how to conduct hybrid search in MongoDB Atlas. In order to be successful with this tutorial, you will need a handful of prerequisites:
In this tutorial, we will be focused on the embedded movies collection in our sample_mflix database, and we will be following along with this $rankFusion hybrid search tutorial, with a handful of caveats.
While the new $rankFusion operator is available in clusters versioned 8.1+, and it makes hybrid search exceptionally easier in MongoDB, it’s still in Public Preview and free tier clusters are automatically on version 8.0. So, we will follow this tutorial, change a couple things, and finish off strong.
Provision your cluster
When provisioning your cluster, please ensure you are downloading the sample_mflix dataset. This is the dataset we will be working with throughout the tutorial. We are focusing on the sample_mflix.embedded_movies collection.

Quickly check to ensure you have mongosh downloaded.
mongosh --version
For this tutorial, I am using 2.4.2.
Now, connect to your MongoDB Atlas cluster:
mongosh "mongodb+srv://<yourclusterhere>.mongodb.net/" --apiVersion 1 --username <yourusername>
You will be prompted for the password, and once connected, it will look like this:

Now, we can connect to our database. Run the command:
use sample_mflix
The output will be:
switched to db sample_mflix
Create your indexes
We can go ahead and create both the indexes we need for this tutorial. They are our vector index, and our full-text search index. Keep in mind that we are creating both these indexes on our embedded_movies collection.
Vector index:
db.embedded_movies.createSearchIndex(
"hybrid-vector-search",
"vectorSearch",
{
fields: [
{ type: "vector", path: "plot_embedding_voyage_3_large", numDimensions: 2048, similarity: "dotProduct" }
]
}
)
Full-text search index:
db.embedded_movies.createSearchIndex(
"hybrid-full-text-search",
"search",
{ mappings: { dynamic: true } }
)
Double-check and make sure inside your MongoDB Atlas cluster that the indexes are ready:

Query our database
Now, we want to query the data in our sample_mflix.embedded_movies collection for “star wars” in the plot_embedding_voyage_3_large field.
Following along with the tutorial, since we aren’t having to use any APIs, we can save all the embeddings we need into a separate file named query_embeddings.js.

We can now load in our embeddings to use in the query. Run the following:
load('/Users/<PATH NAME>/query_embeddings.js')
const queryVec = STAR_WARS_EMBEDDING
To ensure the embeddings are successfully loaded, run the command:
STAR_WARS_EMBEDDING.length

An output of 2048 is correct.
Aggregation pipeline
Now, we can create an aggregation pipeline for hybrid search. Please keep in mind that there are a handful of limitations and reasoning behind why we had to approach the aggregation pipeline in this manner.
As said above, the first is that, while there is $rankFusion in MongoDB Atlas, it is currently in Public Preview, and you need a cluster 8.0 or higher. With a free tier cluster, you cannot use $rankFusion at this time. That complicates things a bit since MongoDB Atlas won’t natively fuse the text and vector rankings for us. This means we have to do our own fusion, or a weighted blend of the two ranked lists.
The stage placements in the aggregation pipelines are strict, as well:
$searchmust be the first stage of a pipeline.$vectorSearchhas to be the start of a pipeline and cannot be inside of$facet.
A workaround? $vectorSearch can be the first stage of our subpipeline inside $unionWith, since this subpipeline is thought of as its own pipeline.
Because of this, we have to start with $search for our full-text search results to keep their score, and then use $unionWith as a second mini-pipeline that starts with $vectorSearch so it also keeps its score.
We then need to group by _id so that we don’t have any duplicates and so that we are using the best from each of our $search and $vectorSearch terms.
Once that is done, we need to figure out the hybrid search, which is the best from our $search, the best from our $vectorSearch, and then multiplied by a weight of our choosing (I chose 0.35), and sort from there.
The weight determines how much influence vector search has compared to full-text search in the final results.
Our hybrid formula is: hybrid_score = text_score + (vector_score x weight). So, a higher weight (e.g., 0.7-1.0) means that vector search dominates, and in turn prioritizes semantic similarity.
A lower weight (0.1-0.3), on the other hand, means that full-text search dominates and helps prioritize exact keyword matches.
Copy and paste this aggregation pipeline into your terminal window:
const WEIGHT = 0.35;
db.embedded_movies.aggregate([
// Stage 1: full-text search
{ $search: { index: "hybrid-full-text-search", text: { query: "star wars", path: ["title","plot"] } } },
{ $set: { t: { $meta: "searchScore" } } },
{ $project: { _id: 1, title: 1, plot: 1, t: 1 } },
// Stage 2: Union with vector search
{ $unionWith: {
coll: "embedded_movies",
pipeline: [
// Vector search subpipeline
{ $vectorSearch: {
index: "hybrid-vector-search",
path: "plot_embedding_voyage_3_large",
queryVector: queryVec,
numCandidates: 200,
limit: 150
}},
{ $project: { _id: 1, title: 1, plot: 1, v: { $meta: "vectorSearchScore" } } }
]
}},
// Stage 3: Combine and rank results
{ $group: { _id: "$_id", title: { $first: "$title" }, plot: { $first: "$plot" }, t: { $max: "$t" }, v: { $max: "$v" } } },
{ $set: { h: { $add: [ { $ifNull: ["$t", 0] }, { $multiply: [ { $ifNull: ["$v", 0] }, WEIGHT ] } ] } } },
{ $sort: { h: -1 } },
{ $limit: 20 }
]).toArray()
Here is our output:
{
_id: ObjectId('573a139af29313caabcf124d'),
title: 'Star Wars: Episode III - Revenge of the Sith',
plot: 'As the Clone Wars near an end, the Sith Lord Darth Sidious steps out of the shadows, at which time Anakin succumbs to his emotions, becoming Darth Vader and putting his relationships with Obi-Wan and Padme at risk.',
t: 4.957413673400879,
v: 0.756283700466156,
h: 5.2221129685640335
},
{
_id: ObjectId('573a13a6f29313caabd17d08'),
title: 'Star',
plot: 'The Driver now carries an arrogant rock star who is visiting a major city (not Pittsburgh as earlier believed). Played by Madonna, this title character wants to get away from her bodyguards...',
t: 5.151784420013428,
v: null,
h: 5.151784420013428
},
{
_id: ObjectId('573a1397f29313caabce8cdb'),
title: 'Star Wars: Episode VI - Return of the Jedi',
plot: 'After rescuing Han Solo from the palace of Jabba the Hutt, the rebels attempt to destroy the second Death Star, while Luke struggles to make Vader return from the dark side of the Force.',
t: 4.726564407348633,
v: 0.7782549858093262,
h: 4.998953652381897
},
{
_id: ObjectId('573a13aef29313caabd2da15'),
title: 'Star Runner',
plot: 'Get ready for the ultimate martial arts competition, where anything goes and lives are bought and sold. Tank is the celebrated Champion Star Runner and is deemed invincible among the ...',
t: 4.702453136444092,
v: 0.696668267250061,
h: 4.9462870299816135
},
{
_id: ObjectId('573a13c0f29313caabd62f62'),
title: 'Star Wars: The Clone Wars',
plot: 'Anakin Skywalker and Ahsoka Tano must rescue the kidnapped son of Jabba the Hutt, but political intrigue complicates their mission.',
t: 4.537533760070801,
v: 0.7561323642730713,
h: 4.802180087566375
},
{
_id: ObjectId('573a1397f29313caabce6f53'),
title: 'Message from Space',
plot: 'In this Star Wars take-off, the peaceful planet of Jillucia has been nearly wiped out by the Gavanas, whose leader takes orders from his mother (played a comic actor in drag) rather than ...',
t: 4.401333808898926,
v: 0.7913960218429565,
h: 4.678322416543961
},
{
_id: ObjectId('573a139df29313caabcfa90b'),
title: 'Message from Space',
plot: 'In this Star Wars take-off, the peaceful planet of Jillucia has been nearly wiped out by the Gavanas, whose leader takes orders from his mother (played a comic actor in drag) rather than ...',
t: 4.401333808898926,
v: 0.7913960218429565,
h: 4.678322416543961
},
{
_id: ObjectId('573a1398f29313caabce9851'),
title: 'Gymkata',
plot: 'Johnathan Cabot is a champion gymnast. In the tiny, yet savage, country of Parmistan, there is a perfect spot for a "star wars" site. For the US to get this site, they must compete in the ...',
t: 4.284864902496338,
v: 0.7193170189857483,
h: 4.53662585914135
},
{
_id: ObjectId('573a1397f29313caabce68f6'),
title: 'Star Wars: Episode IV - A New Hope',
plot: "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a wookiee and two droids to save the universe from the Empire's world-destroying battle-station, while also attempting to rescue Princess Leia from the evil Darth Vader.",
t: 2.9629626274108887,
v: 0.7987099885940552,
h: 3.242511123418808
},
{
_id: ObjectId('573a139af29313caabcf0f5f'),
title: 'Star Wars: Episode I - The Phantom Menace',
plot: 'Two Jedi Knights escape a hostile blockade to find allies and come across a young boy who may bring balance to the Force, but the long dormant Sith resurface to reclaim their old glory.',
t: 2.9629626274108887,
v: 0.7771433591842651,
h: 3.2349628031253816
},
{
_id: ObjectId('573a1397f29313caabce77d9'),
title: 'Star Wars: Episode V - The Empire Strikes Back',
plot: 'After the rebels have been brutally overpowered by the Empire on their newly established base, Luke Skywalker takes advanced Jedi training with Master Yoda, while his friends are pursued by Darth Vader as part of his plan to capture Luke.',
t: 2.7174296379089355,
v: 0.7810181975364685,
h: 2.9907860070466996
},
{
_id: ObjectId('573a139af29313caabcf1258'),
title: 'Star Wars: Episode II - Attack of the Clones',
plot: 'Ten years after initially meeting, Anakin Skywalker shares a forbidden romance with Padmè, while Obi-Wan investigates an assassination attempt on the Senator and discovers a secret clone army crafted for the Jedi.',
t: 2.7174296379089355,
v: 0.7400004267692566,
h: 2.9764297872781755
},
{
_id: ObjectId('573a1394f29313caabcdf65b'),
title: 'Ugetsu',
plot: 'A fantastic tale of war, love, family and ambition set in the midst of the Japanese Civil Wars of the sixteenth century.',
t: 2.858368396759033,
v: null,
h: 2.858368396759033
},
{
_id: ObjectId('573a13a4f29313caabd1137f'),
title: 'S1m0ne',
plot: "A producer's film is endangered when his star walks off, so he decides to digitally create an actress to substitute for the star, becoming an overnight sensation that everyone thinks is a real person.",
t: 2.842216730117798,
v: null,
h: 2.842216730117798
},
{
_id: ObjectId('573a13b8f29313caabd4c3c3'),
title: 'Star Trek',
plot: "The brash James T. Kirk tries to live up to his father's legacy with Mr. Spock keeping him in check as a vengeful, time-traveling Romulan creates black holes to destroy the Federation one planet at a time.",
t: 2.577816963195801,
v: 0.7295459508895874,
h: 2.8331580460071564
},
{
_id: ObjectId('573a13b5f29313caabd42e99'),
title: 'Sars Wars',
plot: "The fourth generation of the virus SARS is found in Africa! It's more dangerous and causes the patients to transform into bloodthirsty zombies. The virus quickly lands to Thailand, Dr. ...",
t: 2.826810598373413,
v: null,
h: 2.826810598373413
},
{
_id: ObjectId('573a13b5f29313caabd42e1b'),
title: 'Sars Wars',
plot: "The fourth generation of the virus SARS is found in Africa! It's more dangerous and causes the patients to transform into bloodthirsty zombies. The virus quickly lands to Thailand, Dr. ...",
t: 2.826810598373413,
v: null,
h: 2.826810598373413
},
As we can see, we have some results that are identical to our query, “star wars,” and other results that are clearly off of meaning.
Conclusion
Congratulations! You have successfully completed hybrid search in MongoDB. While $rankFusion will simplify this process, this method shows a workaround while the operator is still in Public Preview. Through this tutorial, we have successfully incorporated both full-text search and vector search into one pipeline to retrieve the most optimal results for our given query. For more information on hybrid search in MongoDB, please refer to the MongoDB documentation. If you’re still getting up to speed with MongoDB, I recommend the Introduction to MongoDB in Python course.
MongoDB Hybrid Search FAQs
Do I need MongoDB 8.1 and `$rankFusion` to do hybrid search
No. On 8.0, you can actually fuse the scores yourself. Do this by running $search and $vectorSearch in two pipelines and combine them with $unionWith + $group and a weighted formula. On 8.1+, $rankFusion will do this for you.
What is hybrid search?
It’s combining full-text search (exact words) and vector search (meaning) for the most optimal results possible from a given query.
When should I use hybrid search?
Use hybrid search when your text is varied (synonyms, paraphrases, typos) but you still want exact terms.
How do I pick the weight I want to use?
It’s best practice to begin around 0.2-0.5. The lower is more for full-text search influence, and higher is for semantic influence. It’s important to tune the weight after testing and viewing the results provided.
Can hybrid search work with image and audio data?
Yes! As long as your data can be turned into vector embeddings and you can combine them with any specific wording constraints, you can perform hybrid search on a dataset.