Using fuzzystrmatch SELECT levenshtein('pat1', 'pat2') - gives levenshtein distance between the two; which is how many edits is necessary to make a match between them
Comparing with pg_trgm (works with trigrams - groups of three consecutive symbols in a string): SELECT similarity('pat1', 'pat2') - returns a value between 0 and 1, where 1 is a perfect match and 0 complete mismatch based on existing trigrams
POSTGRE EXTENSIONS: PostGIS - location queries PostPIC - image processing fuzzystrmatch, pg_trgm - full text search capabilities
To check which extensions are available to be installed and enabled: SELECT name FROM pg_available_extensions
To check which extensions are already installed: SELECT extname FROM pg_extension
Enabling an extension: CREATE EXTENSION IF NOT EXISTS fuzzystrmatch; SELECT extname FROM pg_extension - to confirm that it is included
User-defined data types
- Enumerated data type - defines a custom range of values that are never going to change CREATE TYPE name AS ENUM ( )
Getting info about the iser-defined data-types:
SELECT typname, typcategory FROM pg_type WHERE typname = 'dayofweek'
or alternatively: SELECT column_name, data_type, udt_name FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'film'
Defining custom functions: CREATE FUNCTION squared(i integer) RETURNS integer AS
LANGUAGE plpgsql
LIKE _ wildcard - matches exactly one character % wildcard - matches zero or more characters, case sensitive
Example of full-text search: Select title, description From film WHERE to_tsvector(title) @@ to_tsquery('pattern'); case insensitive tsvector itself is a format
TRUNCATING AND PADDING THE STRING DATA
SELECT TRIM([leading|trailing|both], [characters], from string) - removes the first, the last or both characters from the string, similar works the rtrim, ltrim SELECT LPAD('padded', n, 's') - padding from the left the 'padded' with 's' to get the string with the length of n if 's' is ommited, then the space is added; if n is less than the original length, the string is truncated
PARSING STRINGS
SELECT CHAR_LENGTH/LENGTH(COLUMN) - returns the number of chracters in a string SELECT POSITION('symbol' IN COLUMN) - returns the position of the given symbol SELECT STRPOS(COLUMN, 'symbol') - same SELECT LEFT/RIGHT(COLUMN, n) - extracts first/last n symbols from the column SELECT SUBSTRING(COLUMN, n_start, n_end) - extracts a substring starting from n_start ending with n_end) SELECT SUBSTRING(COLUMN, FROM n_start FOR n_end) - same, but can take function output as a number, such as POSITION, CHAR_LENGTH
MANIPULATING STRINGS AND TEXT
SELECT COLUMN1 || ' ' || COLUMN2 - concatenates two columns in a new column (double pipe operator) SELECT CONCAT(COLUMN1, ' ', COLUMN2) SELECT UPPER/LOWER(COLUMN) - transforms column into upper/lower case SELECT INITCAP(COLUMN) - transforms into the title case SELECT REPLACE(COLUMN, 'whattochange', 'howtochange') - replaces a pattern in a column SELECT REVERSE(COLUMN) - reverses the column
EXTRACTING AND TRANSFORMING DATE AND TIME
EXTRACT(field FROM source) DATE_PART('field', source)
DATE_TRUNC('year'/'month'/'sthels', TIMESTAMP/DATE) truncation by year brings a timestamp to a first day of the year, by month - of the month
CASTING AND SELECTING MOMENTS IN TIME
SELECT NOW() (with timezone) SELECT NOW()::timestamp - casting of NOW() output into casting (postgre only) SELECT CAST(NOW() as timestamp) SELECT CAST(NOW() as date) SELECT CURRENT_TIMESTAMP(precision) SELECT CURRENT_DATE SELECT CURRENT_TIME (with timezone)
ARITHMETICS ON DATE/TIME/TIMESTAMPS
AGE - takes two timestamps and calculates the difference
INTERVAL data type can be used for arithmetics with TIMESTAMPs INTERVAL can convert different data types (INTEGERs, TIMESTAMPs) into INTERVALs "SELECT INTERVAL '1' day * timestamp '2019-04-10 12:34:56'" INTERVAL + DATE = DATE shifted by the INTERVAL ubtracting DATE data types yields an INTEGER, subtracting TIMESTAMPs yields an INTERVAL;
Add your notes here
Explore Datasets
Use the different tables to explore the data and practice your skills!
- Select the
title,release_year, andratingof films in thefilmtable.- Add a
description_shortenedcolumn which contains the first 50 characters of thedescriptioncolumn, ending with "...". - Filter the
filmtable for rows where thespecial_featurescolumn contains "Commentaries".
- Add a
- Select the
customer_id,amount, andpayment_datefrom thepaymenttable.- Extract date information from the
payment_datecolumn, creating new columns for theday,month,quarter, andyearof transaction. - Use the
rentaltable to include a column containing the number of days rented (i.e., time between therental_dateand thereturn_date).
- Extract date information from the
- Update the title column so that titles with multiple words are reduced to the first word and the first letter of the second word followed by a period.
- For example:
- "BEACH HEARTBREAKERS" becomes "BEACH H."
- "BEAST HUNCHBACK" becomes "BEAST H."
- Reformat your shortened title to title case (e.g., "BEACH H." becomes "Beach H.").
- For example: