Skip to main content
HomeBlogData Engineering

An Introduction to The dbt-utils Package

Learn how dbt-utils enhances dbt with pre-built macros, key features, and practical use cases. Streamline your data transformations with our step-by-step guide.
Jun 2024  · 12 min read

The dbt-utils package is a powerful toolkit that simplifies and enhances your data transformation processes. In this guide, we'll introduce you to dbt-utils, showcasing how it can streamline your workflow with its pre-built macros for common SQL operations.

What is dbt-utils?

dbt-utils package

dbt-utils is a package in dbt which provides a toolkit of pre-built macros that enhance the functionality of dbt projects. dbt-utils reduces complexity by adding layers of abstraction, enabling more efficient and consistent data transformation.

If you’re familiar with libraries in Python and packages in R, the dbt-utils package would be something similar.

With these standardized macros used for common Structured Query Language (SQL) operations, dbt-utils ensures that all the tough work from coding all the transformations from scratch is not needed.

Leveraging these utilities eliminates reinventing the wheel by integrating a suite of community-vetted macros into your projects, enhancing maintainability and fostering a culture of best practices.

Key features and functionalities

dbt-utils provides a variety of features and functionalities that make data transformation more efficient and streamlined. These include:

1. Standardized macros

dbt-utils offers a wide range of pre-built macros for common SQL operations using Jinja templating, such as joins, aggregations, filtering, and more.

These macros are standardized to ensure consistency across projects and reduce the need for repetitive coding.

For example, here are some macros used by analytics engineers:

  • date_spine: Creates a continuous sequence of dates between two specified dates, useful for time series analysis.
  • pivot: Transform rows into columns dynamically, simplifying crosstab operations.
  • star: Gets all fields from specified tables in the model, excluding those in the except argument.
  • union_relations: Combine data from multiple relations with the same structure using a union all.
  • generate_series: Create a series of numeric data, handy for time series analysis.
  • surrogate_key: Produce a unique identifier for rows by concatenating multiple fields.

With these macros, SQL can accomplish tasks that are otherwise complex and repetitive.

2. Schema and documentation assistance

dbt-utils provides helpful tools for managing schemas and documentation within data models. This makes it easier to keep track of changes and maintain organization within projects.

4. Testing capabilities

With dbt-utils, users can easily write tests to ensure the accuracy and quality of their data transformations. These tests can be automated and integrated into continuous integration processes, allowing for more efficient and reliable data pipelines.

Why Use dbt-utils?

So, why use this package?

Well, let’s first have a look at some of its benefits:

  1. Simplified complex SQL: With the use of macros, dbt-utils simplifies the writing of complex SQL queries. This saves time and effort for data analysts and engineers, allowing them to focus on higher-level analysis rather than getting bogged down in code.
  2. Efficient documentation: As mentioned earlier, dbt-utils can help to generate documentation automatically.
  3. Increased quality control: By incorporating testing capabilities into data pipelines, dbt-utils helps ensure the accuracy and reliability of data transformations.
  4. Automated data validation: With the use of dbt-utils, data validation can be automated and integrated into the development process. This ensures that any changes made to data models are immediately validated and any potential issues are caught early on.
  5. Open-source: dbt-utils is open source, so many developers can contribute to the development of the package. This means the package will be constantly updated.

dbt-utils Installation and Setup

Let’s have a look at how the dbt-utils package can be introduced into your data workflow below.

Prerequisites

Before using dbt-utils, familiarity with SQL and an understanding of dbt (data build tool) fundamentals are imperative. You should already have a dbt project initialized and configured.

If you need a refresher, you can check out our Introduction to dbt course and dbt tutorial.

Furthermore, a clear version alignment between your dbt core and the dbt-utils package is necessary. It is advisable to check the compatibility between dbt versions to prevent conflicts or deprecated functionalities.

Step-by-step installation

To begin the integration of dbt-utils into your data stack, start by ensuring your development environment is properly set up.

Step 1: Install dbt

First, ensure that dbt is installed in your environment. You can install dbt via pip:

pip install dbt

Step 2: Add dbt-utils to your dbt project

To install dbt-utils, add the following to your packages.yml file in your dbt project root directory.

Create this file in your dbt project root if it doesn't exist, and add the following:

packages:
- package: dbt-labs/dbt_utils
version: "1.1.1" # Use the latest version compatible with your dbt version

In this yaml file, you’ll be specifying the necessary package version as well. Make sure to use the latest version that is compatible with your installed dbt version.

Step 3: Install the package

After modifying, the installation is a simple command away.

The successful execution of this command will fetch and install dbt-utils, preparing you for enhanced functionality.

Run the following command in your terminal within your dbt project directory:

dbt deps

This command installs all the packages listed in your packages.yml.

dbt-utils Key Utilities and Their Applications

The dbt-utils package provides a suite of common helper functions that streamline SQL transformations within dbt projects.

These functions help developers avoid redundancies and concentrate on their unique business logic.

Here are some functions in dbt-utils:

1. SQL generators

SQL generators in dbt-utils are useful for creating modular SQL code.

For example, the deduplicate macros are excellent for eliminating duplicate rows while maintaining the order of the data. Duplicates can be removed from models, tables and even Common Table Expressions (CTEs).

2. Generic tests

Generic tests are used to validate data in a table or view. They can be easily customized to fit specific requirements and provide valuable insights into the quality of data being processed.

For instance, the expression_is_true macro checks if a certain expression is true. This can be used flexibly to verify several conditions, such as:

  • Column length
  • Output of a specific basic algebraic operation

3. Jinja helpers

Jinja helpers are useful for creating dynamic SQL queries that can be easily customized based on user input or variable data. They allow for conditional logic and looping within SQL statements.

Jinja templates are used to define and specify these macros.

One example is the pretty_time macro, which returns a string of the current timestamp.

4. Web macros

Web macros are similar to Jinja helpers, but they are specifically designed for web-based projects. They allow for dynamic HTML generation and manipulation, making it easier to create interactive and responsive web pages.

One useful macro is the get_url_path macro, which gets the page path of the URL. Here’s how the syntax looks:

{{ dbt_utils.get_url_path(field='page_url') }}

5. Introspective macros

Introspective macros are macros that can access and manipulate data within the current scope. This allows for dynamic and efficient data processing, reducing the need for additional functions or code.

One example is the get_column_values macro, which returns the query results as an object.

Practical Use Cases of dbt utils

Use case 1: Simplifying complex queries

Example scenario: A company has a large dataset containing customer information, transaction data, and product data. They need to create a report that shows the total sales for each product category in the past quarter.

Instead of writing complex SQL queries to join multiple tables and calculate the sales for each category, they can use dbt utils to easily extract relevant data and manipulate it within their macros or models.

For example, they can use get_filtered_columns_in_relation to filter out only the necessary columns related to product data and then use sum macro to calculate the total sales for each category.

This simplifies the process and makes it more efficient, saving time and reducing errors.

Code implementation:

To implement this solution in a mock database, you can follow these steps:

Use the get_filtered_columns_in_relation macro to filter only the necessary columns from the product data. This macro will help you select relevant columns efficiently.

-- models/product_data_filtered.sql
with product_data as (
select
{{ dbt_utils.get_filtered_columns_in_relation(
relation=ref('products'),
include=['product_id', 'category']
) }}
from {{ ref('products') }}
)
select * from product_data

Next, create a model to join the product, transaction, and customer data. Then use the sum function to calculate the total sales for each product category in the past quarter.

-- models/total_sales_by_category.sql
with product_data as (
select
{{ dbt_utils.get_filtered_columns_in_relation(
relation=ref('products'),
include=['product_id', 'category']
) }}
from {{ ref('products') }}
),
transaction_data as (
select
product_id,
sale_amount,
transaction_date
from {{ ref('transactions') }}
where transaction_date >= date_trunc('quarter', current_date) - interval '1 quarter'
),
joined_data as (
select
p.category,
t.sale_amount
from product_data p
join transaction_data t
on p.product_id = t.product_id
)
select
category,
sum(sale_amount) as total_sales
from joined_data
group by category

After defining the models, run the dbt models to execute the transformations and generate the report.

dbt run

Here’s an explanation of the code above:

The get_filtered_columns_in_relation macro helps in selecting only the necessary columns (product_id and category) from the products table, simplifying the dataset.

Next, the filtered product data is joined with the transaction data using SQL to get the sales amounts for each product. It then calculates the total sales for each product category within the last quarter using the sum aggregation function.

Use case 2: Ensuring data quality with generic tests

Example scenario:

In this use case, we will continue working with the same mock dataset from Use Case 1. However, instead of creating a report, our goal is to ensure data quality by running generic tests on the data.

Code implementation:

Below are examples of how to implement these tests for your dataset:

1. Testing for proportion of nulls

In this generic test, we will assert only values that fit the non-null proportion that we specify. We will do this using the not_null_proportion test.

# models/products.yml
version: 2
models:
- name: products
columns:
- name: product_id
tests:
- dbt_utils.not_null_proportion:
at_least: 0.95

In this example, we included an additional optional argument at_least to set the non-null proportion to have a maximum of 0.95.

2. Testing for empty fields

In this example, we’ll use the not_empty_string test to check for empty strings in the product_id field.

# models/products.yml
version: 2
models:
- name: products
columns:
- name: product_id
tests:
- dbt_utils.not_empty_string

3. Testing referential integrity

Using the relationships_where function will ensure that every product_id in the transactions table exists in the products table.

# models/transactions.yml
version: 2
models:
- name: transactions
columns:
- name: product_id
tests:
- dbt_utils.relationships_where:
to: ref('products')
field: product_id

In this example, we ensure that all the product_id values in the transactions table correspond to valid entries in the   table. This is a basic relationship integrity check.

Do take note that you can also combine multiple tests at the same time so you can cover several integrity checks.

Once all the tests are defined in your yaml config file, use the dbt test command to execute all the defined schema tests.

dbt test

dbt utils Best Practices

Leveraging dbt-utils efficiently

When using dbt, it's important to follow best practices to ensure that your project runs efficiently and effectively. When using dbt-utils, there are a few key points to keep in mind:

  1. Combine multiple tests rather than defining them separately: As mentioned before, you can combine multiple tests into one single test statement. This helps to reduce redundancy and improve the overall performance of your project.
  2. Use the dbt_utils.surrogate_key macro when creating surrogate keys: This macro automatically generates surrogate keys based on other columns in your table, saving you time and effort.

Common pitfalls and how to avoid them

Here are some common pitfalls that users may encounter when using dbt-utils, along with tips on how to avoid them:

  1. Not properly configuring custom macros: If you're creating your own custom macros, make sure they are properly configured and tested before incorporating them into your project.
  2. Not updating to the latest version of dbt-utils: It's important to regularly update to the latest version of dbt-utils in order to take advantage of new features and bug fixes.

Conclusion

dbt-utils is a powerful tool for streamlining, testing, and automating tasks in your data modeling process. It is an essential part of using dbt and can save you time and effort in developing your data models.

You might also find this dbt tutorial or this list of alternative data engineering tools useful.

If you’re interested in getting started with dbt, and would like some further learning, explore our Introduction to dbt Course too!


Photo of Austin Chia
Author
Austin Chia

I'm Austin, a blogger and tech writer with years of experience both as a data scientist and a data analyst in healthcare. Starting my tech journey with a background in biology, I now help others make the same transition through my tech blog. My passion for technology has led me to my writing contributions to dozens of SaaS companies, inspiring others and sharing my experiences.

Topics

Continue Your dbt Learning Journey Today!

Course

Introduction to dbt

4 hr
4.7K
This course introduces dbt for data modeling, transformations, testing, and building documentation.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

14 Essential Data Engineering Tools to Use in 2024

Learn about the top tools for containerization, infrastructure as code (IaC), workflow management, data warehousing, analytical engineering, batch processing, and data streaming.
Abid Ali Awan's photo

Abid Ali Awan

10 min

tutorial

dbt Tutorial: 7 Must-Know Concepts For Data Engineers

Learn the 7 most important concepts around dbt - the favorite tool of modern data engineers.
Bex Tuychiev's photo

Bex Tuychiev

11 min

tutorial

Chroma DB Tutorial: A Step-By-Step Guide

With Chroma DB, you can easily manage text documents, convert text to embeddings, and do similarity searches.
Abid Ali Awan's photo

Abid Ali Awan

10 min

tutorial

Complete Databricks Dolly Tutorial for Building Applications

Learn to use the advanced capabilities of Databricks Dolly LLM to build applications.
Laiba Siddiqui's photo

Laiba Siddiqui

tutorial

Databricks DBRX Tutorial: A Step-by-Step Guide

Learn how Databricks DBRX—an open-source LLM can handle complex tasks and generate intelligent results.
Laiba Siddiqui's photo

Laiba Siddiqui

10 min

tutorial

10 Command-line Utilities in PostgreSQL

In this tutorial, learn about 10 handy command-line utilities in PostgreSQL which can enable you to interact with databases efficiently.
Sayak Paul's photo

Sayak Paul

7 min

See MoreSee More