Tutorials
data manipulation
+2

Beginners Guide to SQLite

Learn the basics of SQLite databases from SQLite dot commands to an example of their practical applications using the command line interface.

If you are familiar with relational database systems, it is likely that you have heard of heavyweights such as MySQL, SQL Server or PostgreSQL. However, SQLite is another useful RDBMS that is very simple to set up and operate that has many distinct features over other relational databases. These features include:

  • No need for servers: That is, there are no server processes that need to start, stop or be configured. There is no need for database administrators to assign instances or manage user access permissions.
  • Simple Database Files: A SQLite database is a single ordinary disk file that can be stored in any directory. This also makes it simple to share since database files can easily be copied onto a memory stick or sent through email.
  • Manifest Typing: The majority of SQL database engines rely on static typing. That is, you can only store values of the same datatype that is associated with a particular column. However, SQLite uses manifest typing, which allows the storage of any amount of any data type into any column without no matter the column's declared datatype. Note that there are some exceptions to this rule. One such exception is integer primary key columns, which can only store integers.

As with most things though, there are also some limitations of SQLite with respect to other database engines. For instance, SQLite has no option to do Right or Full Outer Joins, and the serverless nature can also be a disadvantage for security and protection against data misuse. Nevertheless, it is a very powerful tool that comes very much in handy for certain situations. For instance, I have used a SQLite database in a recent social media analysis work project to store an $edge \space list^{1}$ of millions of Twitter users in a very simple and efficient manner.

Thus, for this tutorial, I will illustrate how to operate with a SQLite database. We will go over SQLite setup, table creation and operations, SQLite dot commands, and how to use simple SQL queries to manipulate the data inside a SQLite database.

$^{1}$ An edge list or adjacency list is a collection of unordered lists used to represent a graph in network analysis. Each list describes the set neighbors of a given node or vertex contained in the graph.

The first thing that we'll cover is how to get SQLite into your computer. This is extremely simple. If you look at the SQLite with Python Tutorial by Sayak Paul, you will realize that there is a GUI tool called DB Browser for SQLite that can get you started easily. However, for the sake of variety, I will show how to use the command line tool, which I also tend to prefer at times.

The first thing you need to do if you have a Windows device is to go to the following site and download sqlite-tools-win32-x86-3270200.zip. Once you unzip the folder, you should see the following three files:

After that, initiating the SQLite command line is as simple as clicking on the sqlite3 application, which will make the command line pop up like this:

Now you may notice that the default for SQLite is to use a transient in-memory database. The vast majority of the time you will not want to use that. Instead, we will create a new database using the dot command .open followed by the name you want to give to your new database file (See Below):

Just like that, you have created a new database. Also, note that it is vital that you put .db after your desired database name to create a database file. We named the database Tweet_Data.db since I will be using tweets to populate the tables in the database.

With the database now created, we can proceed to create our table of tweets. We can do this in two ways. First, we can simply import a .csv file and make it a table. This is done by setting SQLite to csv mode using the .mode dot command followed by csv, and with the .import dot command that takes a file (in this case a csv located in the same folder, but it can be any csv in your computer if you type the full path) and a table name of your choice. You may also want to display the column headers using the .headers dot command followed by on. (See below)

Once you have done this, you can use the .tables command to display all the available tables in a database file as well as the .schema command followed by a table name to display the table's structure. Furthermore, you can execute a simple query such as SELECT $$ FROM Table_Name* to verify that the data is there.

In addition to creating the tables by importing files such as csv's, you can also create tables using CREATE TABLE and then insert new rows using INSERT INTO. In the following example, you will see how you can create a table from scratch and insert two new rows.

Note: In the SQLite command line, if you want to keep writing the same query on the next line as I do in the example simply press Enter when you want a new line. The query will be evaluated after pressing Enter when there is a semicolon at the end.

If at one point you decide that you want to export the results of a query to an external file such as a csv, you can also do this using the power of SQLite. For a clean export to a csv, you can use a process similar to that of importing. First, you use .header followed by on and .mode followed by csv. Then, you use the .output dot command followed by your desired csv file name and, finally, you execute the SQL query that you want to export and exit SQLite using the .exit command. Let's try to export to a csv the new table we created in the previous step.

Here is the newly made csv with the data of Sample_Tweets_2 as well.

Apart from creating and exporting tables, you can also use the UPDATE/SET and DROP TABLE SQL statements to update rows in a table or to drop a table all together. In the following, we example we will update the second tweet in the toy Sample_Tweets_2 table.

Now let's say goodbye to the toy Sample_Tweets_2 table and delete it from the database file using DROP TABLE:

Tiny Twitter Case Study

Now, you pretty much know the essentials of how to operate with a SQLite database and manipulate tables, all that is left for you to do is to derive useful information using SQL queries. For now, I'll show a couple of relatively simple queries that I performed during my Twitter social media analysis project. You can think of it as a tiny case study to illustrate a real-world application of a SQLite database.

Let's start with something simple; finding all the unique Twitter users' screen names in the Sample_Tweets table. In the project for which I used these tweets, it was essential for me to find all the unique users from whom I collected tweets since I needed to retrieve their followers in order to create the edge list that I mentioned at the start of this tutorial.

26,860 unique users in the database. That is an excellent sample given that the topics that I was listening to were website localization and translation. Let's see how the top 20 users that had the most tweets in the sample using a slightly more complex SQL query (retweets are excluded):

Would you look at that, the largest number of tweets belongs to a bot! As it turned out though, the last member from that list, UweMuegge, is quite the influencer according to the various network statistics that I calculated for the project. However, he was definitely not the most important.

Lastly, let's take a look at some of the tweets that UweMuegge wrote using another SQL Query:

Two main topics it seems, translation job ads and translation conferences. It kind of makes sense why people in the field would follow him on Twitter.

Another thing that stands out from the above screenshot is that you run into one of the limitations of the SQLite command line interface. If you have large text fields, they won't display nicely, and they will be hard to read. That is why it is a good idea to use the DB Browser for SQLite in this instance.

Conclusions

In this tutorial, you have learned how to create SQLite databases and to manipulate tables using dot commands and SQL statements. In addition, you have seen a small real-world case study where I used a SQLite database to handle data from Twitter. That is a lot, so congrats!

It is worth noting that SQLite databases are most useful when combined with R and Python. You can manipulate SQLite databases through Python using the sqlite3 module (if you want to learn more about this see the SQLite with Python tutorial) or with R using RSQLite. In fact, the edge list of the Twitter users' followers that I have mentioned before, is stored in a SQLite database that I created using RSQLIte by appending each users' follower network to a table. In this step of the project, the use of SQLite was vital since things like power outages or Windows automatic updates can forcefully shut down your computer and make you start collecting follower networks from scratch. This can take around 4 weeks due to the Twitter's API rate limits, so starting from 0 if this happens is very undesirable. However, if you save the follower networks in a SQLite database as you go, even if your computer forcefully shuts down, you can start gathering again from the user after the last one you have in the database, which is a great time saver.

That is just one example of how SQLite databases can be a fantastic tool in the arsenal of a data scientist. I encourage you to further research SQLite databases and SQL beyond what is covered in this tutorial. Keep learning; the Sky is the limit!

If you would like to learn more in SQL, take DataCamp's Introduction to Relational Databases in SQL course.

Want to leave a comment?