Skip to content

Using fuzzystrmatch SELECT levenshtein('pat1', 'pat2') - gives levenshtein distance between the two; which is how many edits is necessary to make a match between them

Comparing with pg_trgm (works with trigrams - groups of three consecutive symbols in a string): SELECT similarity('pat1', 'pat2') - returns a value between 0 and 1, where 1 is a perfect match and 0 complete mismatch based on existing trigrams

POSTGRE EXTENSIONS: PostGIS - location queries PostPIC - image processing fuzzystrmatch, pg_trgm - full text search capabilities

To check which extensions are available to be installed and enabled: SELECT name FROM pg_available_extensions

To check which extensions are already installed: SELECT extname FROM pg_extension

Enabling an extension: CREATE EXTENSION IF NOT EXISTS fuzzystrmatch; SELECT extname FROM pg_extension - to confirm that it is included

User-defined data types

  • Enumerated data type - defines a custom range of values that are never going to change CREATE TYPE name AS ENUM ( )

Getting info about the iser-defined data-types:

SELECT typname, typcategory FROM pg_type WHERE typname = 'dayofweek'

or alternatively: SELECT column_name, data_type, udt_name FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'film'

Defining custom functions: CREATE FUNCTION squared(i integer) RETURNS integer AS

LANGUAGE plpgsql

LIKE _ wildcard - matches exactly one character % wildcard - matches zero or more characters, case sensitive

Example of full-text search: Select title, description From film WHERE to_tsvector(title) @@ to_tsquery('pattern'); case insensitive tsvector itself is a format

TRUNCATING AND PADDING THE STRING DATA

SELECT TRIM([leading|trailing|both], [characters], from string) - removes the first, the last or both characters from the string, similar works the rtrim, ltrim SELECT LPAD('padded', n, 's') - padding from the left the 'padded' with 's' to get the string with the length of n if 's' is ommited, then the space is added; if n is less than the original length, the string is truncated

PARSING STRINGS

SELECT CHAR_LENGTH/LENGTH(COLUMN) - returns the number of chracters in a string SELECT POSITION('symbol' IN COLUMN) - returns the position of the given symbol SELECT STRPOS(COLUMN, 'symbol') - same SELECT LEFT/RIGHT(COLUMN, n) - extracts first/last n symbols from the column SELECT SUBSTRING(COLUMN, n_start, n_end) - extracts a substring starting from n_start ending with n_end) SELECT SUBSTRING(COLUMN, FROM n_start FOR n_end) - same, but can take function output as a number, such as POSITION, CHAR_LENGTH

MANIPULATING STRINGS AND TEXT

SELECT COLUMN1 || ' ' || COLUMN2 - concatenates two columns in a new column (double pipe operator) SELECT CONCAT(COLUMN1, ' ', COLUMN2) SELECT UPPER/LOWER(COLUMN) - transforms column into upper/lower case SELECT INITCAP(COLUMN) - transforms into the title case SELECT REPLACE(COLUMN, 'whattochange', 'howtochange') - replaces a pattern in a column SELECT REVERSE(COLUMN) - reverses the column

EXTRACTING AND TRANSFORMING DATE AND TIME

EXTRACT(field FROM source) DATE_PART('field', source)

DATE_TRUNC('year'/'month'/'sthels', TIMESTAMP/DATE) truncation by year brings a timestamp to a first day of the year, by month - of the month

CASTING AND SELECTING MOMENTS IN TIME

SELECT NOW() (with timezone) SELECT NOW()::timestamp - casting of NOW() output into casting (postgre only) SELECT CAST(NOW() as timestamp) SELECT CAST(NOW() as date) SELECT CURRENT_TIMESTAMP(precision) SELECT CURRENT_DATE SELECT CURRENT_TIME (with timezone)

ARITHMETICS ON DATE/TIME/TIMESTAMPS

AGE - takes two timestamps and calculates the difference

INTERVAL data type can be used for arithmetics with TIMESTAMPs INTERVAL can convert different data types (INTEGERs, TIMESTAMPs) into INTERVALs "SELECT INTERVAL '1' day * timestamp '2019-04-10 12:34:56'" INTERVAL + DATE = DATE shifted by the INTERVAL ubtracting DATE data types yields an INTEGER, subtracting TIMESTAMPs yields an INTERVAL;

Add your notes here

Explore Datasets

Use the different tables to explore the data and practice your skills!

  • Select the title, release_year, and rating of films in the film table.
    • Add a description_shortened column which contains the first 50 characters of the description column, ending with "...".
    • Filter the film table for rows where the special_features column contains "Commentaries".
  • Select the customer_id, amount, and payment_date from the payment table.
    • Extract date information from the payment_date column, creating new columns for the day, month, quarter, and year of transaction.
    • Use the rental table to include a column containing the number of days rented (i.e., time between the rental_date and the return_date).
  • Update the title column so that titles with multiple words are reduced to the first word and the first letter of the second word followed by a period.
    • For example:
      • "BEACH HEARTBREAKERS" becomes "BEACH H."
      • "BEAST HUNCHBACK" becomes "BEAST H."
    • Reformat your shortened title to title case (e.g., "BEACH H." becomes "Beach H.").