course
Pip Python Tutorial for Package Management
If you are considering becoming a data scientist, the sooner you start learning how to code, the better. Data professionals spend a great deal of their time coding. Programming languages are the key tools that allow data professionals to analyze and extract meaningful insights from vast amounts of data.
Probably the most popular programming language for data science is Python. Python is an open-source, general-purpose, and powerful programming language, with applications in many software domains, such as web development, game development, and, of course, data science.
While Python itself alone is already capable of many cool things, data professionals –and, more broadly, software developers– often make use of additional packages –also known as libraries– to make their life easier. A package is a collection of related files, modules, and dependencies that can be used repeatedly in different applications and problems.
One of the key strengths of Python is its wide catalog of well-documented and comprehensive libraries. Where are these libraries hosted? How can you install and manage the packages of your interest?
In this tutorial, you will be introduced to the world of packages in Python and pip, the standard package installer for Python. Pip is a powerful tool that will allow you to leverage and manage the many Python packages that you will come across as a data professional and a programmer.
Understanding Packages in Python
Let’s use a metaphor to understand what pip is. Imagine Python is a nice and balanced toolbox with the essential items you will need to code. When you buy (install) Python on your computer, it comes with a wide collection of additional tools (packages) that you can use anytime.
The so-called Python Standard Library is an extensive set of built-in packages that provides standardized solutions for many problems that occur in everyday programming. Since these packages come bundled in modern Python distributions, you can use them without any additional installation required. You just have to “import” them to your working space (more on this coming up later).
However, sometimes you will not find the tool you are looking for in Python or its Standard Library. In these cases, you will need to get new tools elsewhere. Fortunately, the internet is a huge store where you can find hundreds of thousands of packages developed by Python developers for all kinds of purposes.
And the best thing? The wide majority of these packages are free for use. If you want to know more about packages in Python and how to develop your own packages, check out our Developing Python Packages Course.
While third-party packages can be hosted in different locations, the most popular and comprehensive repository is the Python Package Index (PyPi). With over 300,000 available Python packages, PyPI is a giant online repository of packages that are accepted by the Python community.
Once you have identified the package you are looking for, you will need to download and install it on your computer to use it. How can you do it? Here is where package managers come into play.
Understanding package managers: pip
A package manager (also called a package-management system) is a tool that automates the process of installing, upgrading, configuring, and removing packages for a computer in a consistent manner.
Package managers are designed to eliminate the need for manual installs and updates, thereby ensuring that a package is installed together with all the dependencies it requires to function. Equally, since package managers leverage the information stored on certified package repositories, like PyPi and Anaconda, they ensure the integrity and authenticity of packages.
The most popular package manager for Python is pip. Developed in 2008, pip (an acronym of “pip Install Packages”) is today the standard tool for installing Python packages and their dependencies in a secure manner. Most recent distributions of Python come with pip preinstalled. Python 2.7.9 and Python 3.4 and later versions include pip by default.
Pip is a powerful and user-friendly tool that allows you to manage Python packages using a handful of commands. Although pip uses PyPi as a default repository for fetching packages, it has also the capability to install packages from other sources, including:
- Version control systems like Github, Mercurial, Subversion, and Bazaar.
- Requirements files. Usually, Python packages require multiple packages to run. To install all the necessary packages at once, pip uses the so-called requirements.txt, which contains a list of the necessary packages, as well as the correct versions.
- Distribution files. These are versioned, ready-to-install files containing Python packages, modules, and other resource files necessary for a package to function. They come in two forms:
- Source distribution (usually shortened to “sdist”)
- Wheel distribution (usually shortened to “wheel”)
In order to use pip and start managing packages, you first need to ensure that it's installed on your computer. To check whether pip is available, run one of the following statements in the command line:
>>pip3 --version
>>pip --version
pip vs pip3 vs pip2
After reading the previous section, you may be wondering what’s the difference between pip and pip3. Following the release of Python 3, pip incorporated the new pip3 command, which always operates on the Python 3 environment of your computer. The same goes for the pip2 command. So, if you want to make sure that pip operates on your Python 3 environment or your Python 2 environment, use the pip3 or pip2 commands, respectively.
By contrast, the pip command operates on whichever Python environment is appropriate to the context. This is relevant when you have both Python 2 and Python 3 installed on your computer.
For example, MacOS computers rely on Python 2 to run some of its core functionalities. If you are working on a Python 2 environment, the pip command will install, uninstall, upgrade, or manage Python packages for Python 2. The same applies if you are working on a Python 3 environment.
However, in these situations, you should be certain of the Python environment you are working in before using the pip command, otherwise you may manipulate packages in the wrong environment.
pip in action
Now that you know the basis of pip and have it installed on your computer, let’s see how you can use it!
Installing Packages with pip
The most common use of pip is installing Python packages. For example, if you want to install pandas, the standard package to manipulate data frames in Python, the simplest way to do it is by running the following instruction:
>>pip install pandas
[...]
Successfully installed pandas
You may need to install a package in a certain version. This is pretty easy with pip. You just have to specify the version you want to install. Say you want to install version 1.4.0 of pandas:
>>pip install pandas==1.4.0
In case you want to install a package meeting certain conditions regarding versions, pip allows you to use certain boolean conditions. For example, if you want to install a pandas version greater or equal to v.1.0.0 and less than 1.5:
>>pip install pandas>=1.0.0,<1.5.0
Installing the scikit-learn Package with pip
In the following example, you will learn how you can install the scikit-learn package, which will install the other necessary dependencies.
pip install scikit-learn
You may notice from the logs that more than the scikit-learn
package is being installed. This is because pip will install any other packages that scikit-learn depends on. These other packages are called dependencies.
Installing a List of Packages Using pip Requirements Files
When you work in collaborative projects, it’s very common that all members of the team use the same packages with the same versions. To ensure this, the best way is by installing packages using a requirements file. This is usually a text file that contains all the packages, along with their respective versions, that are used in the project.
Pip allows you to install a list of packages at once using a requirements file. For example, if we need for our project the packages numpy, pandas, and TensorFlow, we could include them, along with the desired versions, in a requirements.txt file, as shown below:
To install the packages listed in a requirements.txt file, we just need to run:
>>pip install -r requirements.txt
If you want to create a requirements file to share with the rest of the team, you can use the following instruction:
>>pip freeze > requirements.txt
Installing Packages from GitHub with pip
Besides PyPi, there are other sources on the Internet where Python packages can be hosted. Version control systems like GitHub include package repositories where you can upload and share packages with the Python community.
Let’s say you want to install packages hosted on GitHub. Pip only needs a working executable to be available on a GitHub URL. For example:
>>pip install git+https://github.com/pypa/sampleproject.git@main
pip Upgrading Packages
Sometimes you will need to upgrade to a newer version of a package you have already installed on your computer. Pip makes this process extremely easy. For example, if you want to upgrade pandas to the latest version:
>>pip install --upgrade pandas
In case you want to upgrade all the packages listed on a requirements.txt, you could use:
>>pip install -r requirements.txt --upgrade
It’s important to note that pip also performs an automatic uninstall of an old version of a package before upgrading to a newer version.
Removing Packages with pip
Removing a package is very easy with pip. If you want to uninstall pandas, you just have to run:
>>pip uninstall pandas
Additional pip Commands
While installing, upgrading, and removing packages are the most common actions you will do with pip, there are also other commands worth mentioning.
If you need extra information about the different pip commands available and how to use them, run:
>>pip help
To list all the packages installed on your environment:
>>pip list
To see a summary of a package of your interest:
>>pip show [NameOfPackage]
Troubleshooting pip
Although pip is a fairly simple package, with just a handful of commands, problems can always arise. Here is a list of the most common issues and how to fix them.
Installing pip
Although unusual, it’s possible that pip isn’t installed. In this case, the easiest way to install pip is by running the statement below. This will make Python trigger the built-in package ensurepip, which is designed to install pip in a Python environment.
>>python3 -m ensurepip --default-pip
In case the problem persists, check out the pip documentation to try alternative solutions.
Pip is not up-to-date
New versions of a package can bring bug fixes, new features, and faster performance. This applies to every Python package, including pip. If you are using an older version of pip, you may experience unexpected behavior. That’s why it’s always recommended to keep pip up-to-date, as well as the setuptools and wheel packages, which are useful to ensure you can also install packages from source archives.
>>pip install –upgrade pip setuptools wheel
Conclusion
We hope you enjoyed this tutorial! As the standard tool for installing packages, pip is a vital tool for Python developers. It’s fairly easy to use, which makes package management a straightforward process once you get familiar with the commands and the dynamic.
If you are looking for additional resources, check out the following DataCamp materials and get started with Python and pip today!
Python pip FAQs
What is a package in Python?
A package is a collection of related files, modules, and dependencies that can be used repeatedly in different applications and problems.
What is a package manager?
A package manager is a tool that automates the process of installing, upgrading, configuring, and removing packages for a computer in a consistent manner.
What is pip?
pip is a standard package manager in Python.
How can you install a package with pip?
Use pip install [NameOfPackage]
How can you upgrade a package with pip?
Use pip install --upgrade [NameOfPackage]
How can you remove a package with pip?
Use pip uninstall [NameOfPackage]
Python Courses
course
Introduction to Data Science in Python
course
Intermediate Python
tutorial
Python Tutorial for Beginners
tutorial
Python Setup: The Definitive Guide
J. Andrés Pizarro
15 min
tutorial
Installing Anaconda on Windows Tutorial
DataCamp Team
5 min
tutorial
How to Upgrade Python and Pip in Windows, MacOS, and Linux
Samuel Shaibu
11 min
tutorial
Installing Anaconda on Mac OS X
DataCamp Team
7 min