Skip to content

Pricing Optimizing in SQL Code-Along - April 1, 2025

The Business Problem

Businesses care a lot about the price of their products.

  • When the products are priced too low, they lose tons of revenue.
  • When the products are priced too high, they lose... tons of revenue.

That said, even though business leaders know that finding the correct price of a product is extremely important, they often have no analytical tools to figure out pricing optimization. Sometimes, leaders think it sounds too complex.

The Goal of the Code-along

By the end of this code-along, I guarantee you will be able to take any dataset with two features:

  • price
  • quantity_sold

and perform a powerful optimization technique in order to find the price that maximizes revenue.

Having the analytical skill to solve this optimization problem is going to have real results for your company, and it's sure to make you stand out as a data analyst or a data scientist. What is more, the technique is extremely versatile; it applies to all kinds of companies and industries, whether you are selling watches, tires, or shoes.

This code-along will also be useful if you are interested in creating your own project to demonstrate your understanding of SQL, statistics, algebra, calculus, as well as showing business acumen.

The questions to answer

  1. What is the price that you can sell your product for that maximizes total revenue?
  2. What revenue can you expect after you optimize the price?

The Parts of the Code-along

This project will have three parts:

Part 1: Performing linear regression in SQL

  • The business case: We will build a model to describe quantity sold as a function of price.
  • The SQL learning: We will also practice SQL aggregate functions and common table expressions (CTEs).

Part 2: Understanding price, quantity sold, and revenue

  • The business case: We will then explore the use of a quadratic regression technique to describe revenue in terms of only price.
  • The SQL learning: We will practice using SQL subqueries.

Part 3: Finding the price that maximizes revenue

  • The business case: We will then learn how to find the derivative of our function, which is the theoretical price that maximizes revenue.
  • The SQL learning: We will learn how to report the results of our optimization.

SQL Refresher

Before we dive in, let's start off with a refresher on some common SQL functions that we will be using in this tutorial.

Essential aggregate functions

  • COUNT(*) – Total number of rows (used for N)
  • SUM() – Sum of values or expressions, e.g. SUM(price * quantity_sold)
  • AVG() – Mean of a column, needed for centering and intercept calculation
  • STDDEV() – Standard deviation, used in slope computation via correlation
  • CORR(x, y) – Pearson correlation between two variables

Specialized linear regression functions (DuckDB)

  • REGR_SLOPE(y, x) – Returns slope directly
  • REGR_INTERCEPT(y, x) – Returns intercept directly

Other Tools

  • WITH clause – Common Table Expressions (CTEs), used for organizing and reusing subqueries
  • AS – Aliasing columns and subqueries for readability and reuse

Mathematical and Transform Functions

  • LN() – Natural log (log base e), used for log-transforming quantity_sold
  • EXP() – Exponential function, used to "undo" the log and model predicted quantity

Our Dataset

This dataset is something I created myself. It has two columns:

  • price: this column represents the price of the products sold
  • quantity_sold: the column represents the number of products sold at a given price

The dataset has 200 rows. Each row represents a day of sales showing how many products we sold for a given price.

Getting Started

Now, let's get started!

Part 0: Exploratory data analysis

The very first thing is exploring our dataset. We are interested in two variables: price and quantity_sold. So, let's start by choosing our columns with SELECT and choosing our dataset with FROM.

Looking at our table

As a first step, we can look at our demand_curve_data table.

Instructions
  • Use SELECT to choose price and quantity_sold from the dataset named "demand_curve_data.csv"
Spinner
DataFrameas
demand_curve_data
variable
SELECT ROUND(price, 2), quantity_sold
FROM 'demand_curve_data.csv';

Data visualization

Data visualization is an important part of exploratory data analysis. And DataLab comes equipped with really great visuals that make data exploration easy. Here, I'm using DataLab's built-in UI to create a scatterplot, no code required, and this scatterplot, as you can see, is integrating perfectly into our SQL workflow.

This scatterplot let's us see two things, one obvious and one not obvious:

  • There's an inverse relationship between price and quantity_sold.
  • The relationship is not entirely linear, and the relationship between quantity_sold and price changes for different ranges of price.
Instructions
  • Use SELECT to choose price and quantity_sold from the dataset named "demand_curve_data.csv"
  • Use DataLab to create a nice scatterplot
Spinner
DataFrameas
df
variable
SELECT price, quantity_sold
FROM 'demand_curve_data.csv';

Part 1: Performing linear regression in SQL

Now that we have completed the first step, which is to look at the data and create a first visual, we can move on to creating a linear model. Here are the two parts:

  • Step 1: The first step is to review the formulas for the slope and intercept of a simple linear regression.
  • Step 2: The next step is to use the SQL aggregate functions to find the parts we will need so we can plug values into those equations.

Linear regression equations:

We can describe the slope and intercept in this way:

  • The slope can be expressed at the correlation () multiplied by the quotient of the standard deviation of over the standard deviation of
  • The intercept can then be expressed as the average of minus the product of the slope multiplied by the average of .

Here are those equations. refers to the slope of a linear regression line, and refers to the intercept.

Personally, I rather like this set of equations because it shows you that, in simple linear regression, the slope is the correlation coefficient but back onto the scale of the data, and it also shows you that the regression line goes through the center of mass, which gives the regression its balancing property.

β€Œ
β€Œ
β€Œ