Managing PostgreSQL Databases in Python with psycopg2

Discover how to create, connect to and manage PostgreSQL databases using Python’s psycopg2 package.

Updated Mar 21, 2025 · 14 min read

Data is at the heart of nearly every digital application you can think of. From mobile apps and computer games to banking tools and autonomous vehicles, everything is based on data. Where is all this data stored? The answer is databases.

A database is an organized collection of structured data, typically stored electronically in a computer system. Despite the recent development of new types of databases (designed to address the increasing volume and variability of data), a considerable amount of data around the world is still stored in what are known as relational databases. Relational databases store data as collections of predefined tables with rows and columns that are connected through one or more relationships.

Being able to wrangle and extract data from these databases is an essential skill within the data industry and is a skill in increasing demand. The standard way to create and manage databases is SQL (Structured Query Language). SQL is the building block for some of the most popular relational databases on the market, such as PostgreSQL, Microsoft SQL Server, MySQL, and SQLite

In this tutorial, we will focus on PostgreSQL-based databases and how you can create, connect to, and manage them using Python. The SQL-Python tandem is one of the must-have skills you should master through your data science journey. To connect Python with PostgreSQL, we will use the psycopg2 package, the most popular PostgreSQL database adapter for Python. Ready for the challenge? Let’s get started!

Understanding PostgreSQL

PostgreSQL is a lightweight, free, and open-source relational database. Because of its proven architecture, reliability, data integrity, and smooth integration with other popular programming languages, such as Python and R, PostgreSQL is extremely well accepted by the industry, with companies of all sizes and regions using it.

PostgreSQL uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.

To start using PostgreSQL, you first have to install it on your computer first. On the PostgreSQL official site, you can find the many ready-to-use packages and installers of PostgreSQL and a source code archive available for different platforms and use cases.

For this tutorial, we will use Postgres App, a simple, native macOS app that includes all the features required to get started with PostgreSQL, but keep in mind that all other available packages are equally valid, and the PostgreSQL syntax to run queries is the same across packages.

If you want to know more about PostgreSQL, check out our Beginner's Guide to PostgreSQL and the good number of SQL courses.

Understanding psycopg2

In order to connect to a database that is already created in your system or on the Internet, you will have to instruct Python how to detect it. In other words, you will have to tell Python the database of your interest is a PostgreSQL database.

In Python, you have several options that you can choose from. In this case, we will use psycopg2, probably the most popular PostgreSQL database adapter for Python. Psycopg2 requires a few prerequisites to work properly on your computer. Once you have installed them (read the documentation for more information), you can install psycopg2 just like any other Python packages:

pip install psycopg2

However, if you want to use psycopg2 straightforwardly, you could also install psycopg2-binary, a stand-alone version of the package, not requiring a compiler or external libraries. This is the preferred installation for new users.

pip install psycopg2-binary

Finally, if you are using Python in a Conda environment, you should install psycopg2 using the Anaconda installation:

conda install -c anaconda psycopg2

Now that you’re all set, let’s create your first connection to your PostgreSQL session with psycopg2!

Connecting Python to PostgreSQL

In order to use Python to interact with a PostgreSQL database, we need to make a connection. This is done with the psycopg2 connect() function, which creates a new database session and returns a new connection instance.

For this tutorial, we will connect with a database called “datacamp_courses” that is hosted locally.

conn = psycopg2.connect(database = "datacamp_courses", 
                        user = "datacamp", 
                        host= 'localhost',
                        password = "postgresql_tutorial",
                        port = 5432)

The basic connection parameters required are:

database. The database name.
user. User name required to authenticate.
password. Password used to authenticate.
host. Database server address (in our case, the database is hosted locally, but it could be an IP address).
port. Connection port number (defaults to 5432 if not provided).

Become a Data Engineer

Build Python skills to become a professional data engineer.

Get Started for Free

Creating a table in PostgreSQL

It’s time to create your first table in the datacamp_courses database. We want to create a table with information about some of the courses in the DataCamp course catalog. The table has the following schema:

The specification gives us quite a bit of information on the table's columns. The table's primary key should be course_id, and its data type should be an integer. A primary key is a constraint that enforces the column values to be non-null and unique. It lets you uniquely identify a specific or a set of instances present in the table.

The remaining columns provide information about the course name, the name of the course instruction, and the topic of the course.

Before creating the table, it’s important to explain how the connection instance you’ve just created works. In essence, the connection encapsulates a database session, and it allows you to execute SQL commands and queries, such as SELECT, INSERT, CREATE, UPDATE, or DELETE, using the cursor() method, and to make changes persistent using the commit() method.

Once you have created the cursor instance, you can send commands to the database using the execute() method and retrieve data from a table using fetchone(), fetchmany(), or fetchall().

Finally, it’s important to close the cursor and the connection to the database whenever you’ve finished your operations. Otherwise, they will continue to hold server-side resources. To do so, you can use the close() method.

Below, you can find the code to create the datacamp_courses table:

# Open a cursor to perform database operations
cur = conn.cursor()
# Execute a command: create datacamp_courses table
cur.execute("""CREATE TABLE datacamp_courses(
            course_id SERIAL PRIMARY KEY,
            course_name VARCHAR (50) UNIQUE NOT NULL,
            course_instructor VARCHAR (100) NOT NULL,
            topic VARCHAR (20) NOT NULL);
            """)
# Make the changes to the database persistent
conn.commit()
# Close cursor and communication with the database
cur.close()
conn.close()

This is a very basic example of how to create tables on PostgreSQL, but things can get much more complex. If you want to learn more about how to create a PostgreSQL database and explore the structure, data types, and how to normalize databases, check out our Creating PostgreSQL Database course.

Executing Basic PostgreSQL Queries in Python

The datacamp_courses table is ready; now it’s time to use SQL to perform some basic queries!

INSERT

You may have noticed that the table has no values so far. To create records in the datacamp_courses table, we need the INSERT command.

cur = conn.cursor()

cur.execute("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES('Introduction to SQL','Izzy Weber','Julia')");

cur.execute("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES('Analyzing Survey Data in Python','EbunOluwa Andrew','Python')");

cur.execute("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES('Introduction to ChatGPT','James Chapman','Theory')");

cur.execute("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES('Introduction to Statistics in R','Maggie Matsui','R')");

cur.execute("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES('Hypothesis Testing in Python','James Chapman','Python')");

conn.commit()
cur.close()
conn.close()

The resulting table looks like this:

course_id	course_name	course_instructor	topic
1	Introduction to SQL	Izzy Weber	Julia
2	Analyzing Survey Data in Python	EbunOluwa Andrew	Python
3	Introduction to ChatGPT	James Chapman	Theory
4	Introduction to Statistics in R	Maggie Matsui	R
5	Hypothesis Testing in Python	James Chapman	Python

SELECT

Reading data in SQL databases is probably something you will do a lot in your data science journey. This is generally called a SELECT query. For now, let's see how the table datacamp_courses is holding up.

We will call the classic SELECT * FROM database_name statement to read all the data available on the table. Then, we will use the fetchall() method to fetch all the available rows. Notice that PostgreSQL automatically creates a numerical index for the course_id column.

cur = conn.cursor()
cur.execute('SELECT * FROM datacamp_courses;')
rows = cur.fetchall()
conn.commit()
conn.close()
for row in rows:
    print(row)

(1, 'Introduction to SQL', 'Izzy Weber', 'Julia')
(2, 'Analyzing Survey Data in Python', 'EbunOluwa Andrew', 'Python')
(3, 'Introduction to ChatGPT', 'James Chapman', 'Theory')
(4, 'Introduction to Statistics in R', 'Maggie Matsui', 'R')
(5, 'Hypothesis Testing in Python', 'James Chapman', 'Python')

UPDATE

Data often comes with errors. You may have noticed in the previous section that the topic associated with the course “Introduction to SQL” is Julia. After checking the information about the course, we discovered the mistake. We need to change it and write “SQL” instead. This can be done with the UPDATE statement, as follows:

cur = conn.cursor()
cur.execute("UPDATE datacamp_courses SET topic = 'SQL' WHERE course_name = 'Introduction to SQL';")
conn.commit()
conn.close()

DELETE

Finally, you may want to delete one of the records in your table. For example, let’s delete the course Introduction to Statistics in R:

cur = conn.cursor()
cur.execute("""DELETE from datacamp_courses WHERE course_name = 'Introduction to Statistics in R'""");
conn.commit()
cur.close()

Advanced PostgreSQL Queries in Python

In the previous section, we examined the most basic SQL queries. But there’s much to learn. Let’s see some more advanced queries.

ORDER BY

Say you want to sort your database by the name of the instructor. You can use the ORDER BY statement:

cur = conn.cursor()

cur.execute('SELECT * FROM datacamp_courses ORDER BY course_instructor')
rows = cur.fetchall()
for row in rows:
    print(row)

(2, 'Analyzing Survey Data in Python', 'EbunOluwa Andrew', 'Python')
(1, 'Introduction to SQL', 'Izzy Weber', 'SQL')
(3, 'Introduction to ChatGPT', 'James Chapman', 'Theory')
(4, 'Hypothesis Testing in Python', 'James Chapman', 'Python')

GROUP BY

You may want to perform some aggregate functions within different groups of data. For example, you may be interested in calculating the number of courses by the different course instructors. You can do this kind of operation with the GROUP BY function.

cur = conn.cursor()
cur.execute('SELECT course_instructor, COUNT(*) FROM datacamp_courses GROUP BY course_instructor')
rows = cur.fetchall()
for row in rows:
    print(row)

('James Chapman', 2)
('Izzy Weber', 1)
('EbunOluwa Andrew', 1)

JOIN

Up to this point, we’ve only worked with the datacamp_course table. However, you only start leveraging the full potential of relational databases, like PostgreSQL, when you work with multiple tables at once.

The magic tool to combine multiple tables is the JOIN operation. Imagine that we have a second table in our database called programming_languages that contains basic information about the top programming languages for data science, including the name, the position in the TIOBE Index, and the number of courses about the programming language in Datacamp. The table looks like this:

language_id	language_name	course_number	tiobe_ranking
1	SQL	31	8
2	Python	157	1
3	R	132	16
4	Julia	2	33
5	Scala	1	38

Imagine you want to merge the two tables to get the information of the courses, together with the position of the language on the TIOBE index. We will use an INNER JOIN to get only the information of the programming languages that appear in the datacamp_course table.

cur = conn.cursor()
cur.execute("""SELECT course_name, course_instructor, topic, tiobe_ranking
FROM datacamp_courses
INNER JOIN programming_languages
ON datacamp_courses.topic = programming_languages.language_name""")
rows = cur.fetchall()
for row in rows:
    print(row)

('Introduction to SQL', 'Izzy Weber', 'SQL', 8)
('Analyzing Survey Data in Python', 'EbunOluwa Andrew', 'Python', 1)
('Hypothesis Testing in Python', 'James Chapman', 'Python', 1)

This is just one example, but the potential of SQL joins is endless, making it a must-know topic. Our course, tutorial, and cheatsheet on joins will help you get started.

Getting Started with SQLAlchemy

Psycopg2 is a great tool to connect with PostgreSQL databases. But what if you have to work with databases hosted in different SQL databases, such as Oracle and MySQL? In these scenarios, it may be better to use a tool that can connect to different types of SQL databases. To do so, SQLAlchemy is a great candidate.

In simple words, SQLAlchemy allows users to connect databases using Python language, run SQL queries using object-based programming, and streamline the workflow.

While the syntax varies compared to psycopg2, you can do pretty much the same. Below you can find an example on to create an engine to connect with our datacamp_courses datasets, and read all the data available in the datacap_course table

import sqlalchemy as db


engine = db.create_engine("postgresql://datacamp@localhost:5432/datacamp_courses")
conn = engine.connect() 
output = conn.execute("SELECT * FROM datacamp_courses")
print(output.fetchall())
conn.close()

[(3, 'Introduction to ChatGPT', 'James Chapman', 'Theory'),
 (5, 'Hypothesis Testing in Python', 'James Chapman', 'Python'),
 (2, 'Analyzing Survey Data in Python', 'EbunOluwa Andrew', 'Python'),
 (1, 'Introduction to SQL', 'Izzy Weber', 'SQL')]

SQLAlchemy is another great tool worth learning. To get a glimpse of how the tool works, have a look at our SQLALchemy Tutorial.

Using PostgreSQL in DataCamp DataLab

At DataCamp, we work hard to make data science easier and accessible to everyone. That was the reason behind the creation of DataCamp DataLab, a cloud-based notebook specifically designed for data science. You can think of DataCamp DataLab as a mix of Jupyter Notebook and Google Docs, which allows users to write code, analyze data individually or collectively, and share data insights.

Connecting DataLab to PostgreSQL databases, as well as other SQL databases, is extremely easy. DataLab allows you to connect to data sources securely. At the same time, you can seamlessly continue your data analysis with Python or R. You just need to create an integration, following a straightforward, no-code process, where you will need to enter your connection credentials.

Finally, if you want to try the integration but don't have a PostgreSQL database to connect to, you can use one of our sample PostgreSQL databases!

Curious about DataCamp DataLab? Check out this article to know more about this next-generation notebook for data science.

Conclusion

Nice work! In this tutorial, you got acquainted with PostgreSQL and learned how to manage databases without leaving Python, using the psycopg2 package. But this is just the beginning of the journey. SQL is one of the most widely-used programming languages in data science. It’s also fairly easy to learn, so you have no excuse.

Fortunately, Datacamp gets you covered. Here are some resources that might help you to become fluent in PostgreSQL and SQL in general:

Earn a Python Certification

Showcase you are a job-ready data scientist in Python

Build My Data Career

What’s the difference between psycopg2 and psycopg2-binary?

How do I handle database connection errors in Python?

You can catch and handle connection-related errors using Python’s try-except blocks. For example:

import psycopg2
from psycopg2 import OperationalError

try:
    conn = psycopg2.connect(
        database="datacamp_courses",
        user="datacamp",
        password="postgresql_tutorial",
        host="localhost",
        port=5432
    )
except OperationalError as e:
    print(f"An error occurred: {e}")

This allows your script to fail gracefully and even retry connections if needed.

How do I connect to a remote PostgreSQL database instead of a local one?

Can I use environment variables to store database credentials securely?

Yes, storing credentials as environment variables is a safer practice than hardcoding them. You can access them in Python using os.environ:

import os
conn = psycopg2.connect(
    database=os.environ["DB_NAME"],
    user=os.environ["DB_USER"],
    password=os.environ["DB_PASSWORD"],
    host=os.environ["DB_HOST"],
    port=os.environ["DB_PORT"]
)

Use tools like .env files with python-dotenv to manage this locally.

How can I prevent SQL injection attacks when using psycopg2?

Always use parameterized queries instead of string formatting when inserting user data:

cur.execute("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES (%s, %s, %s)",
            ('Course Name', 'Instructor', 'Topic'))

This prevents malicious users from injecting harmful SQL code into your queries.

How do I update multiple rows at once in psycopg2?

You can use the executemany() method to insert or update multiple rows efficiently:

courses = [
    ('Course A', 'Instructor A', 'Python'),
    ('Course B', 'Instructor B', 'R')
]
cur.executemany("INSERT INTO datacamp_courses(course_name, course_instructor, topic) VALUES (%s, %s, %s)", courses)

What data types are available in PostgreSQL, and how do they map to Python?

Here are a few common PostgreSQL data types and their Python equivalents:

PostgreSQL Type	Python Equivalent
INTEGER	`int`
VARCHAR, TEXT	`str`
BOOLEAN	`bool`
DATE, TIMESTAMP	`datetime.date`, `datetime.datetime`
NUMERIC	`decimal.Decimal`

How can I export data from PostgreSQL to a CSV using Python?

You can use the copy_expert() method with a SQL COPY command:

with open('courses.csv', 'w') as f:
    cur.copy_expert("COPY datacamp_courses TO STDOUT WITH CSV HEADER", f)

This is useful for creating backups or exporting data for analysis.

Can I use Jupyter Notebook to run these Python-PostgreSQL commands?

What’s the best way to debug SQL queries that fail in Python?

Author

Javier Canales Luna

I am a freelance data analyst, collaborating with companies and organisations worldwide in data science projects. I am also a data science instructor with 2+ experience. I regularly write data-science-related articles in English and Spanish, some of which have been published on established websites such as DataCamp, Towards Data Science and Analytics Vidhya As a data scientist with a background in political science and law, my goal is to work at the interplay of public policy, law and technology, leveraging the power of ideas to advance innovative solutions and narratives that can help us address urgent challenges, namely the climate crisis. I consider myself a self-taught person, a constant learner, and a firm supporter of multidisciplinary. It is never too late to learn new things.

Topics

Learn more about Python

Course

Introduction to Python

4 hr

6.6M

Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.

See Details

Start Course

Course

Introduction to Databases in Python

4 hr

100K

In this course, you'll learn the basics of relational databases and how to interact with them.

See Details

Start Course

Course

Intermediate Importing Data in Python

2 hr

203.2K

Improve your Python data importing skills and learn to work with web and API data.

See Details

Start Course

blog

What Is PostgreSQL? How It Works, Use Cases, and Resources

PostgreSQL is an open-source object-relational database management system that combines the power of relational databases with the flexibility of object-oriented programming.