Skip to main content

A Comprehensive Guide to dbt Tests to Ensure Data Quality

Learn about the various tests possible in dbt to check and ensure data quality.
Jul 1, 2025  · 11 min read

dbt is growing in popularity in the analytics field for its data transformation and testing functions. In this article, we’ll share what dbt is and how you can conduct some dbt tests to ensure data quality. If you’re just starting out with dbt, be sure to check out our Introduction to dbt course to learn more. 

What Is dbt?

As we explore in our separate guide, dbt (data build tool) is a popular open-source tool used by data teams to transform raw data in their warehouse into clean, tested, documented, and analytics-ready datasets. 

Unlike ETL tools, dbt focuses solely on the "T" (Transform) component of the data pipeline and operates on top of modern cloud data warehouses like BigQuery, Snowflake, Redshift, and Databricks.

One of dbt’s most powerful features is its built-in data testing framework, which allows data practitioners to write tests that validate assumptions about their data. This testing framework is not only helpful for catching issues early but is also crucial for maintaining data quality and trust in a growing, modular analytics codebase.

Why Test in dbt?

Testing is a foundational part of analytics engineering. As datasets grow and more stakeholders depend on data, the cost of undetected data issues rises. dbt’s testing capabilities help you:

  • Detect data issues early: Before downstream dashboards and reports are affected.
  • Automate quality checks: Avoid manual validation work.
  • Encourage good data modeling practices: Promote modular and testable transformations.
  • Support CI/CD practices: Integrate seamlessly into automated workflows.
  • Enforce data contracts: Validate that key constraints (e.g., no duplicates) hold true over time.

Types of dbt Tests

In dbt, tests are SQL-based assertions that validate data assumptions. Tests in dbt fall into two main categories:

1. Generic (built-in) tests

These tests are pre-defined by dbt and are applied declaratively in YAML files. Generic tests cover common constraints that are typically found in data modeling and data warehousing. 

They are:

  • Easy to configure
  • Reusable across columns and models
  • Highly performant for simple checks

The four built-in generic test types include:

  • not_null: Ensures a column does not contain any NULL values.
  • unique: Ensures a column’s values are distinct across rows (used commonly for primary keys).
  • accepted_values: Restricts a column to a predefined set of allowed values.
  • relationships: Validates referential integrity between tables, ensuring that foreign keys match values in another table.

These tests are ideal for enforcing baseline data integrity, especially around keys, categorical fields, and core table relationships.

Generic tests work by automatically generating SQL under the hood. 

For example, a not_null test on the customer_id column will generate SQL similar to:

SELECT *
FROM {{ ref('customers') }}
WHERE customer_id IS NULL

If this query returns any rows, the test fails.

2. Custom (singular) tests

Custom or singular tests are user-defined SQL queries saved as .sql files in the tests/ directory. These are used for:

  • Complex validations that generic tests cannot express
  • Business rule enforcement
  • Cross-model logic
  • Statistical or conditional thresholds

Singular tests offer maximum flexibility. You can validate data in virtually any way, as long as your SQL query returns only rows that violate expectations.

For example:

  • Ensuring that a product's launch date is not after its retirement date
  • Detecting unusually high or low values (outlier detection)
  • Validating that aggregated values align with business totals

Because of their flexibility, custom tests are ideal for analytics teams who want to enforce nuanced data quality contracts in production.

Macros included in dbt can also be used to test, such as the dbt-utils package.

In summary:

Feature

Generic Tests

Custom (Singular) Tests

Defined in

schema.yml

.sql files in tests/ folder

Coverage

Common constraints (e.g. nulls, keys)

Any logic expressible in SQL

Complexity

Simple

Medium to complex

Reusability

High

Low (usually case-specific)

Flexibility

Limited

Unlimited (any SQL logic)

Combining both types gives you the best of both worlds: some consistency and coverage from generic tests, and some precision from custom tests. Let’s explore each type in more detail. 

1. Generic (Built-in) Tests

Generic tests are predefined by dbt and used declaratively by adding them to your model’s schema.yml file. 

These tests typically validate constraints like uniqueness, non-nullability, referential integrity, or values in a defined set.

Let’s explore a simple tutorial to try out generic tests in dbt.

How to declare a generic test

In your model's corresponding schema.yml file, define tests under the columns: section:

version: 2

models:
  - name: customers
    description: "Customer dimension table"
    columns:
      - name: customer_id
        description: "Unique customer identifier"
        tests:
          - not_null
          - unique
      - name: email
        tests:
          - not_null

Example: not_null and unique

yaml
columns:
  - name: customer_id
    tests:
      - not_null
      - unique

This test above ensures that customer_id is both present in every row and distinct. It's commonly used to enforce the assumption that customer_id is a primary key for the table.

Example: accepted_values

columns:
  - name: customer_type
    tests:
      - accepted_values:
          values: ['new', 'returning', 'vip']

This checks that the customer_type field only contains one of the three allowed strings: new, returning, or vip. This test is often used for categorical fields that must conform to a known set of values, such as enums or statuses.

Example: relationships

columns:
  - name: customer_id
    tests:
      - relationships:
          to: ref('customers')
          field: id

This enforces a referential integrity constraint by verifying that every customer_id in the current model exists as an id in the customers table. It mimics a foreign key constraint in SQL, but at the analytics layer.

2. Custom (Singular) Tests

When built-in tests are insufficient for your use case, for instance, you want to validate complex business logic, you can write custom tests using SQL. These are known as singular tests.

How to create a custom test

1. Create a .sql file inside the tests/ directory in your dbt project.

2. Write a SQL query that returns rows that fail the test.

Example: No future dates in orders

In the tests folder, create a file called no_future_dates.sql.

SELECT *
FROM {{ ref('orders') }}
WHERE order_date > current_date

This test checks whether any records in the orders table have an order_date in the future. If rows are returned, the test fails, alerting you to invalid data that may be caused by timezone errors, ETL bugs, or incorrect source system entries.

Example: Duplicate emails per region

-- File: tests/duplicate_emails_per_region.sql

SELECT email, region, COUNT(*) as occurrences
FROM {{ ref('customers') }}
GROUP BY email, region
HAVING COUNT(*) > 1

This test ensures that each email is unique within a given region. This might reflect a business rule where the same person cannot register twice in the same region. Any row returned indicates a data quality violation.

How to Implement Tests in dbt

This section walks through the practical steps to implement, configure, and run dbt tests within your project and deployment pipelines.

Step 1: Setting up dbt

Before you begin writing tests or models, you need to have dbt installed and a new project initialized.

Step-by-step Instructions:

1. Create project folder:

Create a folder in a location of your choice.

2. Navigate into the project folder:

cd dbt-test-project

3. Create virtual Python environment:

python3 -m venv dbt-env

Next, activate the environment once it has been created.

dbt-venv\Scripts\activate

4. Install dbt

pip dbt install

5. Create .dbt folder

mkdir $home\.dbt

6. Initialize dbt

dbt init

7. Create profiles.yml file

Create a new file in your .dbt folder with the following contents:

austi:
  target: dev
  outputs:
    dev:
      type: sqlite
      threads: 1
      database: ./dbt_project.db
      schema: main
      schema_directory: ./schemas
      schemas_and_paths:
        main: ./dbt_project.db

You can replace “austi” with the name of your user profile of your Windows computer.

8. Create dbt_project.yml file

Next, you’ll need to create another config file in the .dbt folder with the following contents.

name: dbt_test_project
version: '1.0'
profile: austi

seeds:
  dbt_test_project:
    +quote_columns: false

Once again, replace “austi” with your user profile name.

9. Verify the project works:

dbt debug

Once setup is complete, you’re ready to begin creating datasets and building dbt models.

Step 2: Creating the dataset

If you're working without access to a data warehouse, you can simulate dbt tests locally using CSV files and dbt's seed functionality.

Step-by-step Instructions:

1. Create CSV files: Place these inside a new /seeds folder of your dbt project.

Here’s how you should name it:

seeds/customers.csv

customer_id,name,email,customer_type
1,Alice Smith,alice@example.com,new
2,Bob Jones,bob@example.com,returning
3,Carol Lee,carol@example.com,vip
4,David Wu,david@example.com,new

2. Create another file in the same directory:

Use this naming convention:

seeds/orders.csv

order_id,customer_id,order_date,order_total,order_status
1001,1,2023-12-01,150.00,shipped
1002,2,2023-12-03,200.00,delivered
1003,1,2023-12-05,175.00,cancelled
1004,3,2024-01-01,225.00,pending

3. Create a config file to identify seeds:

Next, you’ll need to create a config file called dbt_project.yml.

Paste the following contents in the config file. 

name: dbt_test_project
version: '1.0'
profile: austi

seeds:
  dbt_test_project:
    +quote_columns: false

Change the profile field to match the name of your user profile on your Windows computer.

4. Load seed data:

dbt seed

This command creates main.customers and main.orders tables from the CSV files. These are the seeds needed to replace a database.

loading seed data

As you can see from the image above, two seeds have been found and have been loaded.

Step 3: Ingesting into dbt

Now, we’ll create staging models to transform and test your raw data.

Step-by-step Instructions:

1. Create staging models:

Create a new folder called models in your project folder. In that folder, create a file called stg_customers.sql.

Paste the following code in the file:

SELECT
  *
FROM {{ ref('customers') }}

In another file called stg_orders.sql, paste the following code:

SELECT
  *
FROM {{ ref('orders') }}

2. Define schema tests:

Create a new file at the following location:

models/schema.yml

Paste the following into the file

version: 2

models:
  - name: stg_customers
    description: "Customer staging table"
    columns:
      - name: customer_id
        description: "Unique identifier for each customer"
        tests:
          - not_null:
              tags: [critical]
          - unique:
              tags: [critical]

      - name: email
        description: "Email address of the customer"
        tests:
          - not_null

  - name: stg_orders
    description: "Orders staging table"
    columns:
      - name: order_id
        description: "Unique identifier for each order"
        tests:
          - not_null:
              tags: [critical]
          - unique:
              tags: [critical]

      - name: customer_id
        description: "Foreign key to stg_customers"
        tests:
          - not_null:
              tags: [critical]

3. Run dbt to build staging models:

dbt run --select stg_customers stg_orders

These staging models now act as the foundation for applying dbt tests using your locally seeded data.

An example output would be:

building staging models

Step 4: Creating your first test

Now that your data is staged, let’s implement your first dbt test using the generic built-in testing framework.

Step-by-step instructions:

1. Open the schema.yml file where your staging model is defined.

2. Under the columns: section of your model, define a test such as not_null or unique.

Example:

models:

  - name: stg_customers

    columns:

      - name: customer_id

        tests:

          - not_null

          - unique
  1. Run the test using the CLI:
dbt test --select stg_customers

Here’s an expected result:

testing for nulls and uniques

Step 5: Developing custom and singular tests

When your needs go beyond built-in tests, you can create custom tests using SQL.

Step-by-step instructions:

1. Create a new file in the tests/ folder, for example: tests/no_future_orders.sql

2. Add SQL logic that returns failing rows:

SELECT *
FROM {{ ref('stg_orders') }}
WHERE order_date > current_date

3. Run the test:

dbt test --select no_future_orders

This test will fail if any orders have future-dated order_date values.

Here is an expected result:

testing for no future orders

Step 6: Running tests in CI/CD pipelines

To ensure consistent test execution, integrate dbt tests into your development pipeline.

Step-by-step instructions:

1. Use tags to prioritize or isolate tests:

You can modify any model schema to include the following tags in this format:

columns:
  - name: customer_id
    tests:
      - not_null:
          tags: [critical]

2. Run selective tests locally:

dbt test --select tag:critical

If the tests run correctly, you should expect this output:

testing for critical tags

3. Configure your CI workflow to run dbt tests automatically. You can connect this to a CI platform like GitHub Actions. 

Best Practices for dbt Tests

Implementing dbt tests effectively is not just about writing them; it's about integrating testing into your team's development lifecycle and culture. Here are some best practices to guide you:

1. Start with primary keys

Always apply not_null and unique tests to primary key columns in dimension and fact tables. These two constraints are the foundation of reliable joins and deduplication logic in downstream queries.

2. Validate foreign keys with relationships

Foreign keys are critical for maintaining referential integrity. Use the relationships test to mimic foreign key constraints, especially in data warehouses that do not enforce them natively.

3. Use accepted_values for categorical columns

Control dimensional consistency by enforcing specific values in columns like status, region, or product_type. This reduces the risk of typos or unhandled enum expansions.

4. Write custom tests for business logic

Business logic doesn’t always map cleanly to generic rules. Examples of logic you should write custom tests for include:

  • Sales must be non-negative
  • Order dates cannot be in the future
  • A user cannot have multiple active subscriptions

5. Organize tests clearly

If you have many tests set in place, it can get messy very quickly. You should consider these best practices to keep them organized:

  • Place custom tests in the tests/ folder and name them descriptively.
  • Avoid overly complex SQL in tests—keep logic focused and readable.
  • Group tests by model or domain where possible.

6. Fail fast and integrate with CI/CD

Run dbt test as part of your CI pipeline. This ensures that code cannot be merged unless data quality constraints pass. Integrating tests into CI/CD reinforces accountability and trust in the data pipeline.

7. Avoid over-testing

Excessive or unnecessary testing (especially on large volumes of data) can slow down deployments. Focus on tests that are:

  • Business-critical
  • Likely to fail due to upstream issues
  • Helpful for debugging when broken

Avoid testing calculated fields unless they are part of a contractual SLA.

Advanced Example: Outlier Detection

For an advanced test example, we can perform an outlier detection test.

Create a SQL file in your tests folder with the following code:

SELECT *
FROM {{ ref('orders') }}
WHERE order_total > 100000

This is a basic outlier detection test. If your business typically sees orders below $10,000, you can flag orders over $100,000 for manual review. While not a strict data quality violation, this can be valuable for fraud detection or operational monitoring.

Performance Considerations

As your dbt project scales and test coverage grows, performance becomes increasingly important. This section outlines techniques and strategies to maintain test efficiency and control computational costs.

Test optimization techniques

For testing, try these techniques to optimize performance:

  • Incremental testing: Focus tests only on newly added or modified data by using dbt's incremental model structure. This reduces unnecessary scans over full datasets.
  • Query parallelization: Use dbt’s built-in parallel execution by configuring the threads: parameter in dbt_project.yml to run multiple tests concurrently.
  • Selective test execution: Use flags like --select and --exclude to run only relevant tests, especially during development. For example:
dbt test --select customers
dbt test --exclude tag:slow

Cost management strategies

Implementing cost management strategies ensures dbt tests remain performant and cost-efficient, especially in environments where compute billing is tied to query complexity and frequency.

Some strategies are:

  • Tag-based test execution: Assign tags to resource-intensive or optional tests so they can be excluded from standard runs unless explicitly required.
  • Result caching: Utilize warehouse-specific features (e.g., result caching in BigQuery or Snowflake) to avoid repeated computation where possible. Organize tests to benefit from cached results of previously executed queries.
  • Scheduled test batching: Schedule heavier test suites during off-peak hours or outside of production windows to minimize impact on warehouse workloads.

Conclusion

dbt tests are a powerful way to ensure high data quality, automate validation checks, and catch issues early in your transformation pipeline. Generic and custom tests can be used to create robust and maintainable data workflows.

Learn more about dbt in our Introduction to dbt course or our dbt tutorial.

dbt Tests FAQs

What is the unique test in dbt?

In dbt (data build tool), the unique test ensures that all values in a specified column are distinct, meaning there are no duplicate entries.

What are the four generic tests that dbt ships with?

dbt ships with four tests: not_null, unique, accepted_values, and relationships.

What are some dbt test types?

dbt test types include generic built-in tests and custom singular tests.


Austin Chia's photo
Author
Austin Chia
LinkedIn

I'm Austin, a blogger and tech writer with years of experience both as a data scientist and a data analyst in healthcare. Starting my tech journey with a background in biology, I now help others make the same transition through my tech blog. My passion for technology has led me to my writing contributions to dozens of SaaS companies, inspiring others and sharing my experiences.

Topics

Top DataCamp Courses

Track

dbt Fundamentals

0 min
Build reliable, scalable data pipelines with dbt Fundamentals, a hands-on track designed for data analysts, data engineers, analytics engineers.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Top 26 dbt Interview Questions and Answers for 2025

Get ready for your dbt interview! This guide covers top questions, from basic to advanced, along with real-world scenarios. Perfect for anyone wanting to refresh their dbt knowledge or complete beginners.
Laiba Siddiqui's photo

Laiba Siddiqui

15 min

blog

Integration Testing: A Complete Guide for Data Practitioners

This guide explores integration testing strategies, tools, and best practices to help you build reliable, high-performing software systems.
Don Kaluarachchi's photo

Don Kaluarachchi

15 min

cheat-sheet

Data Quality Dimensions Cheat Sheet

In this cheat sheet, you'll learn about data quality dimensions, allowing you to ensure that your data is fit for purpose.
Joe Franklin's photo

Joe Franklin

3 min

Tutorial

What is dbt? A Hands-On Introduction for Data Engineers

Ready to start with dbt and take data modeling to the next level? This hands-on tutorial dives into essential techniques and project patterns to help you build intuitive, high-quality data warehouses with dbt.
Mike Shakhomirov's photo

Mike Shakhomirov

13 min

Tutorial

dbt Tutorial: 7 Must-Know Concepts For Data Engineers

Learn the 7 most important concepts around dbt - the favorite tool of modern data engineers.
Bex Tuychiev's photo

Bex Tuychiev

11 min

Tutorial

Implementing a Semantic Layer with dbt: A Hands-On Guide

Learn how to build a semantic layer with dbt, turning raw data into consistent metrics for analytics and BI tools!
Moez Ali's photo

Moez Ali

8 min

See MoreSee More