Working with Text in SQL
Text data from your queries often may be messy or in the wrong format. It is therefore crucial to be able to clean up and reformat data. This template runs through the basics of combining, editing, and extracting text in PostgreSQL.
This template connects to a movie database by default. However, you can use any integration you want, so long as you have access to a table with text data. You can learn more about integrations here.
💡 The data we will use is contained in the cinema schema of the Course Databases database. If you wish to edit the queries, you must add the schema name when referencing tables in this database (e.g., cinema.films).
Concatenating text
A standard operation you may want to perform is combining the text from two or more columns.
In the example below, we use CONCAT() to combine the title and release_year columns, enclosing the release year in parentheses.
👇 To run a SQL cell like the one below, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell.
SELECT
-- Add title and release year in parentheses
CONCAT(title, ' (', release_year, ')') AS title_and_year
FROM cinema.films
-- Limit to prevent the query taking too long
LIMIT 5Change case of text
Another standard string operation you may want to perform is changing the text's case. The three PostgreSQL functions you can use to accomplish this are:
UPPER()to convert text to uppercase.LOWER()to convert text to lowercase.INITCAP()to convert text to title case.
In the example below, we convert the title in the films table to three different cases.
💡 Note that INITCAP() will capitalize the first letter of each word and may not be appropriate for text in the form of sentences.
SELECT
-- Convert title to three different cases
UPPER(title) AS uppercase_title,
LOWER(title) AS lowercase_title,
INITCAP(title) AS titlecase_title
FROM cinema.films
-- Limit to prevent the query taking too long
LIMIT 5Truncating text
Another helpful text operation is to select a given number of characters from the left or right of a set of text. Depending on the direction you want to extract text from, you can use LEFT() or RIGHT(). You need to specify the text (either a column or a string) and the number of characters you want to extract.
In the example below, we use LEFT() to extract the first 50 characters of the description column in the descriptions table. We then use CONCAT() to add an ellipsis (...) to indicate that the column has been truncated.
SELECT
description,
-- Extract the first 50 characters of the column
CONCAT(LEFT(description, 50),
-- Concatenate '...' to indicate it has been trimmed
'...') AS short_description
FROM cinema.descriptions
-- Limit to prevent the query taking too long
LIMIT 5Trimming text
You may notice a space between the text and the ellipsis in the third row of the example above. This is because the LEFT() function is cutting at precisely 50 characters, whether or not there is whitespace left.
You can use the TRIM() function to remove whitespace from the left and right sides of a string. To trim text, you use the form:
TRIM(FROM string_you_want_to_trim)
In the example below, we add the TRIM() function to eliminate whitespace before concatenating the ellipsis.
💡 You can also supply leading or trailing in front of FROM to only remove whitespace from the left and right of a text.
SELECT
description,
-- Extract and trim the first 50 characters of the column
CONCAT(TRIM(FROM LEFT(description, 50)),
-- Concatent '...' to indicate it has been trimmed
'...') AS short_description
FROM cinema.descriptions
-- Limit to prevent the query taking too long
LIMIT 5Replacing text
Some text may contain characters or pieces of text that you wish to remove or replace. You can use the REPLACE() function to accomplish this. The function takes three arguments:
- The text that you want to use.
- The character(s) you want to remove.
- The character(s) you want to insert.
In the example below, we replace spaces in the
namecolumn of thepeopletable with underscores and convert the entire text to lowercase.
SELECT
name,
-- Replace spaces in the string with an underscore
-- and convert to lowercase
LOWER(REPLACE(name, ' ', '_')) AS revised_name
FROM cinema.people
-- Limit to prevent the query taking too long
LIMIT 5Extracting substrings
You can use the SUBSTRING() function to extract text elements. It follows the following structure:
SUBSTRING(text_to_extract FROM position_to_start FOR number_of_characters)
In the first example below, we use SUBSTRING() to extract the first 10 characters of the title column in the films table.
💡 You will notice this performs in much the same way as the LEFT() function. However, as will be shown below, SUBSTRING() can be used to perform more powerful string extraction.
SELECT
title,
-- Extract the first 10 characters of the column
-- starting from position 0
SUBSTRING(title FROM 0 FOR 10) AS reduced_title
FROM cinema.films
-- Limit to prevent the query taking too long
LIMIT 5In the more advanced example below, we use SUBSTRING() in combination with a new function POSITION(). This function returns the location of a string within another string (such as the position of a space " " character).
Using POSITION() allows us to split the name before and after the space to extract the first and last names of actors in the people table.
SELECT
name,
-- Extract a string starting from position 0 up until the space
SUBSTRING(name FROM 0 FOR POSITION(' ' IN name)) AS first_name,
-- Extract a string starting from the position of the space plus one
-- up until the end of the string
SUBSTRING(name FROM POSITION(' ' IN name)+1) AS last_name
FROM cinema.people
-- Limit to prevent the query taking too long
LIMIT 5