If you work in a data-heavy industry, you’ve definitely heard of databases. A very common way of accessing data stored in a database is via SQL. In this article, you will learn more about SQL. But instead of jumping right into it by learning the syntax and terminologies used in SQL, we will visit some key concepts in computer science that will help us answer the question: Is SQL a programming language?
What is SQL?
Let us first start with defining what a database is. Simply put, a database is a storage system to house your structured data. Additionally, a database management system (DBMS) serves as an interface between the database and its end-users or programs. You can learn more about this relationship in DataCamp’s Database Design course.
The most common type of DBMS is a relational database management system (RDBMS) where we store structured data, i.e., data incorporating relations among entities and variables. These entities and variables are organized as a set of tables with columns and rows.
The data in RDBMS is stored in a database object called table. A database often contains several tables. Each table is identified by a unique name and contains a collection of related data entries structured as rows and columns. The columns in a table are known as fields and hold values for a specific property of a table. Each row in a table is known as a record.
SQL pronounced as “S-Q-L” or “sequel” stands for Structured Query Language. SQL is used in programming and is designed for managing data stored in a RDBMS using SQL queries.
A query is nothing but a statement consisting of various SQL commands that together perform a specific task to access, manage, modify, update, control, and organize your data. This definition of a query sounds very close to a ‘computer program’; does this mean SQL is a programming language?
Let's understand better what a programming language means in the next section.
What is a Programming Language?
A programming language consists of a predefined set of vocabulary and grammatical rules. This language can then instruct a computer or computing device to perform specific tasks.
There are many different kinds of programming languages, each with its own distinct features and purpose. However, generally, they can be split into two groups: low-level and high-level programming languages.
The programming languages closer to machine code or binary (1s and 0s) are known as low-level languages.
Low-level languages are often hard for people to read because they are structured to write instructions like a machine thinks. Still, they are easier to understand compared to pure 1s and 0s.
So why use a low-level language? Because they are fast! And they give the programmer more precise control of how computer hardware will function.
On the other hand, high-level languages are closer to how humans communicate. They use words in their instructions close to the languages we use in our daily lives, making them easier to program and maintain. The downside is that they take more time to translate to machine code for the computer to understand.
But as computers get more powerful, the difference in performance between low-level and high-level languages is often only milliseconds. That is why we see most programming jobs looking for people with high-level programming skills.
What is Turing Completeness?
To measure a programming language's power, computer scientists often use the concept of Turing completeness. It is named after Alan Turing, a computer scientist who in 1936 proposed the Turing machine. It became the foundation for theories about computing and computers.
The Turing machine was a device that printed symbols on paper tape in a manner that emulated a person following logical instructions.
Programming languages are formal rule systems that access and modify data. If the rules can simulate Turing's hypothetical computing machine, the rules are said to be "Turing complete." A Turing complete system can be mathematically proven to perform any possible calculation.
We highly recommend watching this video from Computerphile to learn more about Turing Machines and their follow-up videos on the concept of Turing completeness.
General Purpose Programming Language vs. Domain-Specific Language
SQL has a predefined set of keywords and follows a strict syntax. To get started with programming in SQL, check out DataCamp’s Introduction to SQL course, where we start with selecting and filtering records and fields and then move on to aggregate functions, sorting, and grouping.
Given the definition of a programming language as having a certain vocabulary and a specific syntax, SQL definitely qualifies as a programming language. However, it does not qualify as a General Purpose Language (GPL) and is, in fact, a Domain-Specific Language (DSL).
A domain-specific language is a computer language created specifically to solve problems in a particular domain and is not intended to be able to solve problems outside of it.
They are built with particular goals in design and implementation. For example, SQL was built specifically to work with databases, for querying the databases and working with the data stored in them. You cannot build an entire application using only SQL.
Another example of a DSL is Cascading Style Sheets - popularly known as CSS, which is used for styling HTML on websites.
DSL is in contrast to a General Purpose Language (GPL), which is broadly applicable across domains. They are designed to let you write any sort of program with any type of logic you need.
Broadly speaking, DSLs are less powerful than GPLs. However, this comes with a certain benefit. Because of its narrow application domain, SQL is relatively easier to learn than most general-purpose programming languages.
We encourage you to follow DataCamp’s SQL Fundamentals track, where we take you from being a beginner to a pro in SQL with 5 courses in just 21 hours.
So is SQL a Programming Language?
Turing completeness, as we discussed earlier, is an indication of a programming language’s power. The more powerful a language, the more complicated calculations it can perform. With the introduction of common table expressions (CTEs) and Window functions in SQL, we are able to perform recursive and hierarchical calculations with SQL making it even more powerful and qualifying it as Turing complete.
However, it took a long time for the SQL Standard to add the features of hierarchical and recursive queries. Thus some older, traditional RDBMS may not qualify as Turing complete.
We discussed earlier how a general programming language is different from a domain-specific programming language. When people say “programming language,” they often mean “general-purpose programming language,” which can sometimes be confusing.
So when it comes to the proper definition of a programming language, SQL is a domain-specific programming language that is Turing complete. Thus qualifying SQL as a programming language.
For those who already know the basics of SQL and would like to go a step further and get started with window functions, check out DataCamp’s interactive course on PostgreSQl Summary Stats and Window Functions. You will learn how to create queries for analytics and data engineering purposes with window functions, the SQL secret weapon!
The Different Types of SQL
You may have heard about the different types or dialects of SQL: T-SQL, PostgreSQL, MySQL, etc. But you may be wondering what these different types of SQL are and why they exist? Let’s take a look at a brief history of SQL.
Edgar Frank Codd, a British computer scientist working at IBM, published a paper called “A Relational Model of Data for Large Shared Data Banks” in 1970. This is when he invented the relational model for database management, which has served as the theoretical basis for relational databases and relational database management systems ever since.
Following this, efforts were made to create an official SQL standard in the late 1980s.
SQL became a standard of the American National Standards Institute (ANSI) in 1986 and of the International Organization for Standardization (ISO) in 1987. The SQL standard has been revised multiple times since the 1980s, although the core features of the standard SQL have been stable since 1992.
However, new variations kept appearing as database implementers working at different companies needed to solve new problems or work around already existing problems that were not addressed in the SQL standard. This explains why multiple SQL dialects made their appearance and still co-exist today.
Today there exists only one SQL language, but the different Database Management Systems have extended the original SQL standard to add their own functionalities or adapt the syntax to their own way of functioning. By doing so, they gave birth to a variety of SQL dialects.
You can read more about these different dialects in our blog article aptly named ‘SQL Server, PostgreSQL, MySQL... what's the difference? Where do I start?’
Which SQL dialect to start with?
We suggest you start with PostgreSQL.
PostgreSQL is a powerful, open-source, object-relational database system that uses and extends the SQL language. It also has various features that safely store and scale the most complicated data workloads.
This is the closest to standard SQL and is a widely used SQL dialect. Starting with PostgreSQL will give you the most flexible tool to adapt to other SQL dialects later on and translate your skills to other RDBMS.
DataCamp’s Functions for Manipulating Data in PostgreSQL course is a great intermediate-level interactive course to get you moving in the right direction.
SQL and Procedural Languages
So, we know that SQL is not a general-purpose programming language. However, you can combine SQL with procedural languages to mimic some GPL-like behaviors. Doing so makes it technically possible to use SQL in a domain other than the intended one.
Procedural languages are programming languages that specifically involve the use of functions, conditional statements, and variables to write programs recognizable by computers to achieve a certain computational output.
All database management systems (DBMSs) accept one or more dialects of procedural languages. Some examples include:
Procedural Language for SQL (PL/SQL)
This is an extension for SQL created by Oracle Corporation. Similar to general-purpose programming languages, PL/SQL allows conditions and loops. With PL/SQL, you can also declare constants, variables, variable types, triggers, procedures, and functions.
Procedural Language/PostgreSQL (PL/pgSQL)
This is a procedural language supported by the open-source PostgreSQL object-relational database management system. It is very similar to Oracle’s PL/SQL described above and includes elements like loops and conditions. It allows user-defined functions and also operators to be inherited.
This is a programming extension from Sybase and Microsoft that allows declaration of local variables. It has also made changes to the traditional DELETE and UPDATE statements from SQL standard. It includes transaction control, exception and error handling. It is used in Microsoft SQL Server, a common database.
Through the use of such extensions, SQL is able to perform similarly to a GPL, making it an even more powerful and vital skill to learn!
Get started with SQL.
Hopefully, we have convinced you that SQL is not just a programming language. But rather a powerful programming language that helps you get the most out of your data!
Datacamp is an excellent platform for learning data science and analytics online. We have plenty of high-quality online courses that can help you understand SQL and its nuances.
Our Introduction to SQL Server is one such interactive course where you'll have the opportunity to test your knowledge and build your confidence by putting what you learn to practice. You'll work with a digital media database to review the sales of various artists and tracks, Eurovision datasets, and review trends in US power outages to explore a number of different data types and scenarios.
Courses for SQL