Skip to content
Duplicate of Introduction to SQL
  • AI Chat
  • Code
  • Report
  • Introduction to SQL

    Here you can access the books table used in the course.

    Take Notes

    Add notes about the concepts you've learned and SQL cells with queries you want to keep.

    Add your notes here

    Introducing databases A database stores data. Let's imagine that we are in charge of storing and organizing data for a library. We might set up a database that holds information such as the data pictured here on patrons, books, and checkouts. This information is housed in objects called tables, with data organized into rows and columns. This database contains a patrons table, a books table, and a checkouts table.

    1. Introducing databases A closer look at the patrons table shows that it stores various data about our library's patrons, like library card number, name, the year the patron became a library member, and the total overdue fines the patron owes our library.

    2. Relational databases A relational database defines relationships between tables of data inside the database. For example, each of our library patrons might each be associated with several checkouts. Through these relationships, we can draw conclusions about data housed in separate tables in the same database, and answer questions such as "Which books did James check out during 2022?" or "Which books are checked out most often?"

    3. Database advantages These tables might look similar to the way data is organized in spreadsheet applications such as Excel or Google Sheets, but databases are far more powerful than spreadsheets. Databases can store much more data, and storage is more secure due to encryption.

    4. Database advantages Possibly the biggest advantage of a database is that many users can write queries to gather insights from the data at the same time. When a database is queried, the data stored inside the database does not change: rather, the database information is accessed and presented according to instructions in the query. Which leads us to the star of this show:

    5. SQL SQL! SQL, or S-Q-L, is short for Structured Query Language. It is the most widely used programming language for creating, querying, and updating relational databases. Once we are familiar with the data we have and which table it is stored on, we can use SQL to begin writing queries to answer questions about our library -- more on that in Chapter Two.

    Unknown integration
    DataFrameavailable as
    books
    variable
    -- Add your own queries here
    SELECT * 
    FROM books
    WHERE yrr

    TABLES . Tables Now that we know the basic organization of a database, let's take a closer look at the main building block of databases: tables!

    1. A seat at the table We saw in the previous lesson that databases are organized into tables, which hold related data about a particular subject. As we've seen, tables are organized into rows and columns; in the world of databases, rows are often referred to as records and columns as fields. A table's fields are limited to those set when the database was created, but the number of rows is unlimited.

    2. Good table manners Let's talk a little bit about table naming. Table names should be lowercase and should not include spaces - we use underscores in place of spaces. And ideally, a table name would refer to a collective group (like "inventory") but it's also okay for the table to have a plural name (such as "products").

    3. Laying the table: records A record is a row in a table. It holds data on an individual observation. Taking a look at the patrons table, we see that the table has four records: one for each of the patrons. The record for Jasmin indicates that she became a member in 2022 and owes two dollars and five cents in fines.

    4. Laying the table: fields A field is a column in a table. It holds one piece of information about all observations in the table. The "name" field in the patrons table lists all of the names of our library patrons.

    5. More table manners Because field names must be typed out when querying a database with SQL, field naming is important. Generally, field names should be lowercase and should not involve spaces. A field name should be singular rather than plural because it refers to the information contained in that field for a single record. This is why our table has "card_num" and "name" fields rather than "card_nums" and "names". Similarly, two fields in a table cannot have the same name. Finally, field names should never share a name with the table they are housed in so that it's clear in all cases whether a field or table is being referred to.

    6. Assigned seats A unique identifier, sometimes called a "key," is just what it sounds like: a unique value which identifies a record so that it can be distinguished from other records in the same table. This value is very often a number. In the patrons table, it makes sense to use the card_num field as the unique identifier for each patron, not the name field, because it's possible that as our little library grows, two patrons might have the same name.

    7. The more the merrier Having more tables, each with a clearly marked subject, is generally better than having fewer tables where information about multiple subjects is combined. Take a look at the patrons and checkouts tables. Now, here's what our patrons and checkouts tables would look like if we tried to combine them. It's the same data, but much less clear because it now contains duplicate information. While we can see that Izzy has two checkouts and Maham has none, the card_num column is no longer unique because of Izzy's multiple checkouts. We can always use SQL to gather information from multiple related tables and connect them if a question requires it, but table topics should remain separate.

    Explore Datasets

    Use the books table to explore the data and practice your skills!

    • Select only the title column.
    • Alias the title column as book_title.
    • Select the distinct author names from the author column.
    • Select all records from the table and limit your results to 10.

    Data Welcome to the final part of the databases chapter! This lesson will focus on the data inside a database as well as its storage.

    1. SQL data types When a table is created, a data type must be indicated for each field. The data type is chosen based on the type of data that the field will hold - a number, text, or a date for example. We use data types for several reasons. First, different types of data are stored differently and take up different amounts of storage space. Second, some operations only apply to certain data types. It makes sense to multiply a number by another number, but it does not make sense to multiply text by other text for example.

    2. Strings In programming, a "string" refers to a sequence of characters such as letters or punctuation. On the patrons table, the data in the names field is made up of strings, such as "Maham" and "James". SQL has several different data types that can hold strings. Some string data types can only hold short strings, such as a string up to 250 characters. Storing short strings in a small data type like this saves storage space. SQL's VARCHAR data type is more flexible and can store small or large strings - up to tens of thousands of characters! Because of its flexibility, VARCHAR is very commonly used for storing strings.

    3. Integers Integer data types store whole numbers, such as the years in the member_year column of the patrons table. Just as with strings, SQL offers a few different data types for storing integers, depending on how big the numbers we'd like to store are. INT, a common SQL integer data type, can store numbers from less than negative two billion to more than positive two billion!

    4. Floats Float data types store numbers that include a fractional part, such as the 2-point-05 dollars that one patron, Jasmin, owes in fines. Just as we might expect, SQL also offers several float data types depending on how many digits the numbers in the field are expected to be. The NUMERIC data type can store floats which have up to 38 digits total - including those before and after the decimal point.

    5. Schemas Now that we're familiar with data types, we can look at a database schema. Schemas are often referred to as "blueprints" of databases. A schema shows a database's design, such as what tables are included in the database and any relationships between its tables. A schema also lets the reader know what data type each field can hold. The schema for our library database shows the VARCHAR data type is used for strings like book title, author, and genre. We can also see that the patrons table is related to the checkouts table, but not the books table.

    6. Database storage Finally, let's discuss storage. The information we find in a database table is physically stored on the hard disk of a server. Servers are centralized computers that perform services via requests made over a network. In our case, the service performed is data access, but servers are also used to access websites or files stored on the server. Any computer can be a server if it is set up to provide a service, even a laptop! However, servers are generally very powerful and large machines, because they are best equipped to handle a high volume of requests and data.

    Introducing queries Welcome back. Now that we understand how data is organized in databases, we can begin drawing insights using SQL queries!

    1. What is SQL useful for? Recall from the last chapter that SQL is used to answer questions both within and across relational database tables. In the library database, we might use SQL to find which books James checked out from the library in 2022. In an HR database, we could query salaries for employees in Marketing and Accounting to determine whether pay across departments is comparable.

    2. Best for large datasets In many organizations, SQL is used as a complement to other tools such as spreadsheet applications. If the data we're interested in can fit in a spreadsheet and does not have many relationships to other data of interest, we can analyze it in a spreadsheet. But for sprawling and diverse data such as the data related to a retail platform, organizing the data in a database is best. Then, we use SQL queries to uncover trends in website traffic, customer reviews, and product sales. Which products had the highest sales last week? Which products get the worst review scores from customers? How did website traffic change when a feature was introduced? SQL shines when an organization has lots of data with complex relationships.

    3. Keywords Let's write our first SQL code! To do that, we will need to learn a few keywords. Keywords are reserved words used to indicate what operation we'd like our code to perform. The two most common keywords are SELECT and FROM. Perhaps we'd like a list of every patron our library has. The SELECT keyword indicates which fields should be selected - in this case, the name field. The FROM keyword indicates the table in which these fields are located - in this case, the patrons table.

    4. Our first query Let's put these parts together. Here's how the query should be written. The SELECT statement appears first, followed on the next line by the FROM statement. It's best practice to end the query with a semicolon to indicate that the query is complete. We also capitalize keywords while keeping table and field names all lowercase. Now let's take a look at the results of our query, often called a result set. The result set lists all patron names, just as we had hoped. Note that we have not changed our database by writing this query. The tables, including the patrons table, are exactly the same as before we wrote the query. In order to share our results, we can save the SQL code we have written so that our collaborators can use it to query the database themselves. We'll cover saving queries in a later lesson.

    5. Selecting multiple fields To select multiple fields, we can list multiple field names after the SELECT keyword, separated by commas. For example, to select card number and name, we'd list both field names in the order we'd like them to appear in our result set. Notice that this does not have to match the order the fields are presented in the table: listing name before card_num means that name appears first in the results.

    6. Selecting multiple fields As you might expect, we can select three fields such as name, card_num, and total_fine by listing all three field names after the SELECT keyword and separating them with commas.

    7. Selecting all fields What if we'd like to select all four fields in the patrons table? We could list out the four field names after the SELECT statement, but there's an even easier way: we can tell SQL to select all fields using an asterisk in place of the four field names.

    Writing queries It's time to level up on our SQL queries by learning a few more commonly used keywords. Let's dive in.

    1. Aliasing Sometimes it can be helpful to rename columns in our result set, whether for clarity or brevity. We can do this using aliasing. Perhaps we'd like to select the name and hire year for each record in the employees table. We could alias the name column as first_name in the query by adding the AS keyword to indicate an alias of first_name after selecting the name field. The result set now has first_name rather than name as the column header. The alias only applies to the result of this particular query; in other words, the field name in the employees table itself is still name rather than first_name.

    2. Selecting distinct records Some SQL questions require a way to return a list of unique values. Let's imagine that we are interested in getting a list of years in which we hired our current employees. If we select the year_hired field from the employees table, the result set shows several years listed twice, which isn't what we are looking for. To get a list of years with no repeat values, we can add the DISTINCT keyword before the year_hired field name in the SELECT statement. Now, we can see that all of our employees were hired in just four different years.

    3. DISTINCT with multiple fields It's possible to return the unique combinations of multiple field values by listing multiple fields after the DISTINCT keyword. Take a look at the employees table. Perhaps we'd like to know the years that different departments hired employees. We could use this SQL query to look at this information, selecting the dept_id and year_hired from the employees table. Looking at the results, we see that department three hired two employees in 2021.

    4. DISTINCT with multiple fields To avoid repeating this information, we could add the DISTINCT keyword before the fields to select. Notice that the department id and year_hired fields still have repeat values individually, but none of the records are the same: they are all unique combinations of the two fields.

    5. Views Finally, let's discuss saving SQL result sets. In SQL, a view refers to a table that is the result of a saved SQL SELECT statement. Views are considered virtual tables, which means that the data a view contains is not generally stored in the database. Rather, it is the query code that is stored for future use. A benefit of this is that whenever the view is accessed, it automatically updates the query results to account for any updates to the underlying database. To create a view, we'll add a line of code before the SELECT statement: CREATE VIEW, then the name we'd like for the new view, then the AS keyword to assign the results of the query to the new view name. Here, we create a view called employee_hire_years by assigning the results of a query selecting three fields from the employees table to a new view. There is no result set when creating a view.

    6. Using views Once a view is created, however, we can query it just as we would a normal table by selecting FROM the view.

    . SQL flavors Our last topic in this short course is SQL flavors.

    1. SQL flavors SQL has a few different versions, or flavors. Some are free, while others have customer support and are made to complement major databases such as Microsoft's SQL Server or Oracle Database, which are used by many companies. All SQL flavors are used with table-based relational databases like the ones we've seen, and the vast majority of keywords are shared between them! In fact, all SQL flavors must follow universal standards set by the International Organization for Standards and the American National Standards Institute. Only additional features on top of these standards result in different SQL flavors.

    1 Table flatlay photo created by freepik www.freepik.com 3. Two popular SQL flavors Let's take a look at two of the most popular SQL flavors. PostgreSQL is a free and open-source relational database system which was originally created at the University of California, Berkeley, and was sponsored by America's famous Defense Advanced Research Projects Agency, or DARPA. DARPA also sponsored research leading to creating the internet, the computer mouse, and GPS! The name "PostgreSQL" is used to refer to both the database system itself as well as the SQL flavor used with it. SQL Server is also a relational database system which comes in both free and enterprise versions. It was created by Microsoft, so it pairs well with other Microsoft products. T-SQL is Microsoft's proprietary flavor of SQL, used with SQL Server databases.

    1. Comparing PostgreSQL and SQL Server Think of SQL flavors as dialects of the same language. If Claudia speaks American English, she will have no trouble understanding people on a trip to London, even though most people in London speak British English and there are some small differences. Here's an example of a small difference between SQL Server and PostgreSQL: when we want to limit the number of records returned, we use the LIMIT keyword in PostgreSQL. Here, we limit the number of employee names and ids selected to only the first two records. The exact same results are achieved in SQL Server using the TOP keyword instead of LIMIT. Notice that this keyword is the only difference between the two queries! Limiting results is useful when testing code, since many result sets can have thousands of results! It's best to write and test code using just a few results before removing the LIMIT for the final query.

    2. Choosing a flavor New SQL learners may wonder which flavor they should learn. This may be an easy decision if a learner knows that her employer uses Microsoft's SQL Server, for example. Or it might be a hard one for a job seeker or student who doesn't know what database management system a future employer might use. Don't worry too much about what flavor to learn. As we've seen, the differences are small. A PostgreSQL wizard can become a SQL Server wizard by learning a handful of different keywords!

    3. Let's practice!

    Querying a database Hello, my name is Jasmin Ludolf, and I'll be your instructor for this course on using SQL to turn raw data into actionable insights. We'll build on our foundational knowledge of SQL, learn how to reveal insights, and how to present our results clearly.

    1. Course roadmap While SQL can be used to create and modify databases, the focus of this course will be querying databases. Recall that a query is a request for data from a database. In this course, we'll look at how to execute a query for a database using keywords that will enable us to count and view all or a specified amount of records. We'll go over common SQL errors, style guidelines, and the order in which our code will execute. We'll cover how to filter data using various techniques, how to use aggregate functions, and finally, how to sort and group the results. We'll be using PostgreSQL throughout.

    2. Our films database We will work with a films database containing four tables: films, reviews, people, and roles. Our database schema, pictured here, shows the table names, field names, and data types.

    3. COUNT() Here we go with our first new keyword. Let's say we wanted to count something from our people table. The COUNT function lets us do this by returning the number of records with a value in a field. For example, to count the number of birth dates present in the people table, we will use SELECT COUNT birthdate FROM people. The result is 6152 birthdates. We've used the alias "count birthdates" for the field name in this example to make the results more readable.

    4. COUNT() multiple fields If we want to count more than one field, we need to use COUNT multiple times. Here we are counting both the number of names and birth dates present in the people table.

    5. Using * with COUNT() Using COUNT with a field name tells us how many values are in a field. However, if we want to count the number of records in a table, we can call COUNT with an asterisk. For example, this code gives the total number of records in the people table. The asterisk represents all fields. Passing the asterisk to COUNT is a shortcut for counting the total number of records.

    6. DISTINCT Often, our results will include duplicates. We can use the DISTINCT keyword to select all the unique values from a field. This might be useful if, for example, we're interested in knowing which languages are represented in the films table. Adding DISTINCT to our query will remove all duplicates, as we can see here.

    7. COUNT() with DISTINCT Combining COUNT with DISTINCT is also common to count the number of unique values in a field. This query counts the number of distinct birth dates in the people table. Let's take a moment to consider why this number is different from the birthdate count of 6152 we got before. Some people in our table likely share the same birthday; COUNT would include all the duplicates while DISTINCT counts all of the unique dates, no matter how many times they come up.

    Query execution Fantastic work on using COUNT and DISTINCT! Now that we've flexed our SQL muscle a bit, we'll take a small step back and better understand how SQL code works.

    1. Order of execution Unlike many programming languages, SQL code is not processed in the order it is written. Consider we want to grab a coat from a closet: first, we need to know which closet contains the coats. This is similar to the FROM statement, which is the first line to be processed. Before any data can be selected, the table from which the data will be selected needs to be indicated. Next, our SELECTion is made. Finally, the results are refined. Here we use the LIMIT keyword that limits the results to a specified number of records. In this case, we only want to return the first ten names from the people table. Knowing processing order is especially useful when debugging and aliasing fields and tables. Suppose we need to refer to an alias later on in our code. In that case, that alias will only make sense to a processor when its declaration in the SELECT statement is processed before the alias reference is made elsewhere in the query.

    2. Debugging SQL Before we begin working with more advanced queries, it's useful to know more about debugging SQL code and how to read the error messages. Some messages are extremely helpful, pinpointing and even suggesting a solution for the error, as this message does when we misspell the "name" field we'd like to select. Other common errors may involve incorrect capitalization or punctuation.

    3. Comma errors Other error messages are less helpful and require us to review our code more closely. Forgetting a comma is a very common error. Let's say we've drafted this code to find all titles, country of origin, and duration of films. The error message will alert us to the general location of the error using a caret below the line of code, which in this case points to the "country" field name. We must examine the code a little further, though, to discover the missing comma is between "country" and "duration".

    4. Keyword errors SQL displays a similar error message when a keyword is misspelled, but this time, the caret indicator below the offending line is spot on.

    5. Final note on errors There are a few more SQL errors out there, but the three mentioned in this lesson will be the most common ones we will encounter. Debugging is a major skill, and the best way to master this skill is to make mistakes and learn from them.

    Unknown integration
    DataFrameavailable as
    df
    variable
    Run cancelled