Course
The ability to search and manipulate text data is indispensable for data scientists or data engineers. SQL's CONTAINS function is a powerful tool for conducting sophisticated text searches within your databases, enabling you to retrieve data based on specific patterns or keywords.
This tutorial will guide you through using SQL CONTAINS effectively, ensuring you can harness its full potential for your data analysis needs.
Associate Data Engineer in SQL
Understanding SQL CONTAINS
SQL CONTAINS is used for full-text searches, allowing you to query databases for specific words, phrases, or patterns within text data. It's particularly useful for finding specific information in large text fields where the exact location of the data isn't known in advance.
Syntax and Parameters
The basic syntax for SQL CONTAINS is as follows:
CONTAINS (column_name, 'search_pattern')
Key components include:
- Column name: The field where the search will be performed. It must be a full-text indexed column.
- Search pattern: The text you wish to find within the column, enclosed in single quotes.
Operators and Wildcards
To refine your search, SQL CONTAINS supports various operators:
AND&OR: Use these to combine multiple search terms.NEAR: Finds words that are within a certain proximity to each other.- ASTERISK (
*): Acts as a wildcard for performing searches where the exact word may be part of a larger string.
Implementing SQL CONTAINS
Let’s explore how to apply SQL CONTAINS with practical examples:
Basic Usage
To search for the terms "apple" and "orange" within a column named "fruit", you could use:
CONTAINS (fruit, '"apple" AND "orange"')
This query returns rows where both terms are present in the "fruit" column.
Advanced Searches
For more complex searches, combining CONTAINS with other SQL functions allows for intricate queries, such as searching for phrases near each other or using wildcards to find variations of a word.
Advanced searches can involve combining CONTAINS with logical operators, using proximity searches, and leveraging wildcard characters to uncover patterns or specific data points within text fields.
Combining CONTAINS with Logical Operators
SQL CONTAINS supports the use of logical operators such as AND, OR, and NOT to refine searches based on multiple criteria. For example, to find entries that contain both "technology" and "innovation" within a text column, you could structure your query as follows:
SELECT
*
FROM
your_table
WHERE
CONTAINS(
your_column, '"technology" AND "innovation"'
);
Conversely, to find entries containing either term, you would use OR:
SELECT
*
FROM
your_table
WHERE
CONTAINS(
your_column, '"technology" OR "innovation"'
);
Proximity Searches
The NEAR operator is particularly useful for finding words or phrases that appear close to each other within the text. This capability is invaluable for contexts where the relationship between terms affects their meaning. For example, to search for "climate" near "change" within 10 words of each other:
SELECT
*
FROM
your_table
WHERE
CONTAINS(
your_column, 'NEAR((climate, change), 10)'
);
This query prioritizes records where "climate" and "change" are closely related, offering more relevant results for specific research questions.
Wildcard Searches
Wildcards expand the scope of searches by allowing for the inclusion of variable characters. The asterisk (*) serves as a wildcard in CONTAINS queries, enabling searches for partial matches. For instance, to find variations of "compute" such as "computer", "computing", or "computation", you might use:
SELECT
*
FROM
your_table
WHERE
CONTAINS(your_column, '"compute*"');
This approach is particularly useful when you're looking to capture a broad set of data related to a root word or when you're unsure of the exact terms used in the database.
Advanced Pattern Matching
For scenarios requiring even more sophisticated pattern matching, consider combining SQL CONTAINS with the LIKE operator or regular expressions. While CONTAINS excels at full-text search, LIKE and regex can offer pattern flexibility, such as matching specific character sequences or formats (e.g., email addresses or phone numbers).
Practical Example: Combining Techniques for Comprehensive Searches
Let's consider a scenario where you need to find database entries related to environmental technology, but you're not sure of the exact phrases used in the text. You could combine various techniques as follows:
SELECT
*
FROM
your_table
WHERE
CONTAINS(
your_column, '("environment*" NEAR "technology*", 10) OR "climate change" OR "renewable energy"'
)
AND your_column LIKE '%sustainability%';
This query demonstrates how to use CONTAINS for proximity and wildcard searches while also incorporating the LIKE operator for additional pattern matching, offering a robust strategy for complex text searches.
Practical Example of SQL CONTAINS in Data Science
As a data scientist, I work with various types of datasets, mainly involving text datasets for customer satisfaction and feedback. In this particular example, we will be looking for specific keywords that can indicate satisfaction or dissatisfaction in customer feedback.
|
ID |
FeedbackText |
SatisfactionScore |
|
1 |
I am very happy with the service. |
5 |
|
2 |
Not satisfied with the product quality. |
2 |
|
3 |
The delivery was quick, and the product works great... |
5 |
|
4 |
Customer service was unhelpful and rude. |
1 |
|
5 |
Happy with my purchase. Will recommend! |
4 |
CustomerFeedback Table
To useCONTAINS, your database might need to have Full-Text Search enabled.
The CONTAINS keyword is used to find records where the “FeedbackText” column contains the word “happy”.
SELECT
*
FROM
CustomerFeedback
WHERE
CONTAINS(FeedbackText, 'happy');
The above query will return the rows from CustomerFeedback where the FeedbackText column contains the word “happy”.
|
ID |
FeedbackText |
SatisfactionScore |
|
1 |
I am very happy with the service. |
5 |
|
3 |
The delivery was quick, and the product works great... |
5 |
|
5 |
Happy with my purchase. Will recommend! |
4 |
You can't use the CONTAINS command in SQLite database. Instead, try using:
SELECT
*
FROM
CustomerFeedback
WHERE
FeedbackText LIKE '%happy%';
Beyond CONTAINS: Pattern Matching and String Manipulation
While CONTAINS is powerful for text searches, SQL also offers other techniques for working with text data:
LIKE Operator
Similar toCONTAINS, the LIKE operator enables pattern matching but with a simpler syntax, using % as a wildcard. For example, to find any product name containing "apple":
SELECT
*
FROM
Products
WHERE
productName LIKE '%apple%'
Advanced String Functions
SQL provides functions like LEFT and REVERSE for extracting or manipulating parts of text strings, offering extensive capabilities for text analysis and manipulation. You can learn more about these in our Cleaning Data in SQL Server Databases course.
Conclusion
The SQL CONTAINS function and its associated text manipulation capabilities are essential tools for data practitioners. By understanding and applying these techniques, you can effectively search and analyze text data within your databases, unlocking insights and enhancing your data analysis projects.
Remember, the effectiveness of your queries depends not only on mastering the syntax but also on understanding the context and structure of your data. Practice with real-world datasets and SQL projects and explore various functions to fully leverage the power of SQL in text data analysis.
To learn more about SQL operators and explore the SQL basics, check out our SQL Basics Cheat Sheet and SQL Fundamentals skill track.
Become a Data Engineer

As a certified data scientist, I am passionate about leveraging cutting-edge technology to create innovative machine learning applications. With a strong background in speech recognition, data analysis and reporting, MLOps, conversational AI, and NLP, I have honed my skills in developing intelligent systems that can make a real impact. In addition to my technical expertise, I am also a skilled communicator with a talent for distilling complex concepts into clear and concise language. As a result, I have become a sought-after blogger on data science, sharing my insights and experiences with a growing community of fellow data professionals. Currently, I am focusing on content creation and editing, working with large language models to develop powerful and engaging content that can help businesses and individuals alike make the most of their data.

