Official Blog
tutorial
+7

How to Become a SQL Expert

Here’s a comprehensive guide on more advanced SQL concepts and syntax.

By now, you know about all the benefits of SQL and are a seasoned data analyst, data engineer, or data scientist. You know about the different SQL dialects, you’re proficient in importing SQL data, and you’re fluent in writing basic SQL queries. But do you want to grow your skills to the next level? Here’s how.

Intermediate SQL concepts

Know the SQL sublanguages

There are four primary sublanguages in SQL—they refer to different tasks, but are all part of the SQL language:

1. DDL (Data Definition Language)

These statements are used to create tables and databases and define field properties or table properties. Examples of commands in this category include CREATE, ALTER, and DROP.

2. DML (Data Manipulation Language)

The statements that fall under this category are used to update data or add or remove data from tables. Commands in this category include UPDATE, DELETE, and INSERT.

3. DCL (Data Control Language)

These SQL statements are used to control who accesses the data. The commands that fall under this category are GRANT and REVOKE.

4. TCL (Transaction Control Language)

This language is used to commit data and restore data. COMMIT and ROLLBACK fall under this category.

5. DQL (Data Query Language)

This is to retrieve data from SQL server. SELECT statement falls into this category.

These may look intimidating, but don’t worry! SQL is a very elegant, straightforward language to learn, especially because you cannot use SQL to create data visualizations—it’s all about creating tables, organizing and cleaning data, and learning to ask the right questions.

Unions and joins

A lot of the time, the data you need will not be stored in one giant table but will be spread out across multiple tables. Using SQL, you can merge these tables together to combine all the data you need in one place. Since UNION and JOIN deal with tables, they go in the FROM statement.

A UNION will basically stack two tables with identical columns on top of each other. This is useful for sales transaction data that may be broken into different tables by month, quarter, or year.

There are several kinds of commonly used JOINS—Inner, Outer, Left, and Exception. To put it simply, these will return different combinations of rows from the tables being joined together.

Additional resources:

Intermediate SQL syntax

By now, you know the basics of SQL syntax, including the functions SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT. You’re proficient in Database Design and you also understand the SQL order of execution—that queries are written differently from how they’re processed.

Now it’s time to learn more advanced syntax.

Aggregations: SELECT, GROUP BY, and HAVING

Similar to Excel, you can use SQL to aggregate your data by sum, average, maximum, minimum, and count. Aggregations can be useful when initially exploring your data, analyzing, or when presenting it to your audience. You can do these in the SELECT statement to return a simple aggregation, or use a GROUP BY statement to create an Excel “pivot table” out of your data.

Using aggregations in the SELECT statement will return just a simple calculation, for instance, if you were using SUM(column_name), your query would return one value that represent all the values in that column added up.

You can also introduce aggregations by categorical data. For instance, if you had a table of fruit sales data across the country, you could group the sum of all sales by product type, or the average price by state. This requires a GROUP BY statement, which comes after the WHERE statement. You will need to specify the categorical column you want your aggregations to be “grouped by”, and then include those columns in your select statement as well.

After your data has been neatly grouped, with an aggregation paired to a categorical variable, you can then filter on those aggregations by using a HAVING statement, which comes after GROUP BY. This is useful if you wanted to narrow your query even further, for example, if you wanted to select only those fruit stores with an average sales total over $100,000.

Additional resources:

Case statements

A SQL CASE statement is similar to an Excel IF() function—if the data in a column matches a set criteria, then return “this”. This can be useful when reclassifying quantitative data as categorical, such as when breaking down a set of values into “High”, “Medium”, or “Low”. CASE statements go in the SELECT statement.

See also:

Subqueries

Subqueries may be used in several different ways, but are usually found in either the FROM or the WHERE statement. They create a small, temporary table from your data that you can then use as a new table to query from in a different way.

See also:

Dates and times

As with many other data languages, dealing with dates and times can be tricky. Sometimes dates act like “strings”, or pieces of information. Sometimes they can be treated as actual dates, with SQL syntax that can break the data down into units such as months or years. Since there are many different ways to do this, these functions are considered more advanced SQL syntax.

See also:

Advance your SQL thinking

SQL guidelines to follow

When you’re working in the real world, business questions won’t likely translate neatly into SQL queries. That’s why you need to carefully assess what’s needed before even writing a single line of code. Here are some good guidelines to follow when practicing SQL.

Rewrite your business question as a comment first

Use code commenting to “brainstorm” your query first—what is the question you’re trying to answer? What pieces and elements do you need to focus on? This will guide your SQL query.

There are two kinds of commenting, inline and multi-line.

--Inline is just two hyphens and it lives on one line of code—good for short, quick notes

/*Multi-line commenting looks like this.

     It’s good for creating banners that explain a query’s purpose, author, etc*/

Sketch out your query

Before you even start typing, you should figure out what pieces you’ll need for your query. You will always need a SELECT and FROM statement—but what else might you need? Do you need to filter the data using a WHERE statement? Will you need to use aggregate functions?

Understand the different ways to achieve your objective

When possible, try to have multiple ways of doing the same thing. That way, you can quality check yourself and validate if you can get the same results. For example, you can find the highest value in a column either by using the MAX() function in the STATEMENT, or by using ORDER BY to sort the data and then LIMIT to limit so only one row is returned.

Test your query

Build your query one line at a time and run it frequently to check that it works. Adding pieces one at a time also helps prevent you from having to go back through your code line by line in case it doesn’t work.

Profile your table

In order to start performing queries, you first must understand the data elements within it. Start out by viewing the entire database; then, a few columns at a time. Find the range of values, both categorical and quantitative, to get an idea of what information is contained in the table.

Keep a list of assumptions

When writing any kind of data code, you should always make a list of assumptions to easily track your limitations and dependencies.

Keep a data dictionary

Compile a central repository for the data elements you’re using, containing descriptions of the different tables and fields used within the analysis. This can include the different data types in each column (character, integer, money, date, etc), and a brief description of the column itself.

Always allocate time to perform data quality checks

The analysis doesn’t end when you’ve finished writing your query. You still have to understand your results and perform quality checks to make sure your analysis is correct. Try different ways of achieving the same result to see if you get something different—this is another time when inline commenting comes in handy!

Practice in a sandbox

One of the best ways to learn any coding language is to find a “sandbox” to play around in. You’ll be able to test out your code, see if it runs, modify it, run it again, over and over until you feel comfortable with the coding principles and syntax. To start out, you may want a little more guidance—try finding a project with clear instructions and solution code to check your work.

Here are a few good practice resources:

That’s all for now. Hope this was useful to you—keep practicing and learning!