Tutorials
python
+2

Python Tutorial for Beginners

Get a step-by-step guide on how to install Python and use it for basic data science functions.

Python is the most popular programming language today and is widely used across verticals from software and web development to game development, IoT and more. More importantly, it’s become the de facto programming language for data science and machine learning. This blog post will outline why and when to use Python, how to install it, basic Python syntax and functionality, an introduction to simple data analysis, and more.

When and why to use Python

Python is a highly versatile language with many use cases across verticals. Just like many programming languages, Python is open source and free to use, and contains a large library of tools and packages (more on packages later!) that simplify working with a variety of tasks. Here’s are some verticals where Python is useful:

Desktop and Mobile Software Development

Whether building simple software applications like calculators, or document editors and social networks. Python offers a plethora of tools to prototype, develop, and run desktop or mobile applications.

Web Development

Did you know that web applications like Uber, Netflix, and Spotify were built with Python? There are a variety of web frameworks on Python that simplify website development.

Game Development

This may come as a surprise, but games such as The Sims 4 and Civilization IV were built with Python. Many Python libraries exist that streamline prototyping and development of games.

Internet of Things (IoT)

The rise of embedded systems and the internet of things (IoT) has catalyzed innovation and business process improvement across domains. Python offers a set of packages that make it easy to design, create, and deploy an IoT device on Arduino, Raspberry Pi, and other IoT devices such as Alexa and Google Home.

Data Science and Machine Learning

Python’s rise in popularity can be attributed to its rich set of packages and tools for data science and machine learning. Python can be used across the data science workflow, from exploratory data analysis, data pre-processing, model building and deployment, and finally to results interpretation and communication. Since Python is used across the data science workflow, it is used by a variety of data roles from data analysts, data scientists, data engineers, machine learning engineers, and more.

How to install Python

Python can be installed in a variety of avenues, depending on if you have a Mac or Windows, or if you want to install it with Anaconda. Since Windows and Mac are installed similarly, the following instructions can apply to both operating systems. If you want a deeper look into installing Python for data science, check out this article.

Mac and Windows

Go to the Python website, which will send you to the downloads page as seen in the screenshot below. Choose Windows or Mac and click on the download button to access the installer package.


Anaconda

One of the more convenient ways to install Python for data science is by installing an Anaconda distribution, which also installs all relevant data science packages. Follow the link above to install it on either Mac, Windows, or Linux. For a deep dive on installing Anaconda, check out these blog posts for MacOs and Windows.


How to get started with Python for data science

In this section, we will cover how to open Python on Mac and Windows, how to launch the terminal, the basics of Python like print statements and operators, variables and assigning values, data types, packages and how to install them, as well as the popular data science library pandas and how to install it.

1. Launching Python

Launching Terminal

To launch Python, you will first need to launch your terminal. For both Mac and Windows, you can simply search it on your computer. On Mac, the easiest way is to do a Spotlight Search, which is the magnifying glass icon on the upper right corner of your desktop.


Similarly, for Windows, you will search for the command prompt through the start menu, as described in a DataCamp article How to Run Python Scripts. You can also launch the new Windows Terminal after installing and searching for it as mentioned on the Windows blog post Introducing Windows Terminal. After you launch it, here's what you will see:


How to open Python on Mac + Windows

After Python installation is complete, you can launch your terminal by simply typing the word ‘python’. This will start up Python and show you the current version you have installed. The three arrows point to the right means you are ready to code in Python!


2. Basic Python Syntax

In the following syntax commands below, you will type the command and then hit enter or return on your keyboard to see the results.

Next, you can start your first command in Python. We will start off with the print() statement, which will be typing the word print, followed by parenthesis. The output will be typed within those same parentheses and enclosed by quotes.

print("Hello, World")
Hello, World

Operators

Then, you can perform operations, like +, -, and *. As you can see, the + adds the two numbers together, the - subtracts the second number from the first number, the * multiples both numbers together.

5 + 5
10
5 - 5
0
5 * 5
25

Booleans

The next command is to compare two values. In the case below, we will look at comparing the same two numbers, five and five. The command is asking if 5 is greater than 5, which of course, it is not, so the returned output is False. For booleans, we expect the output to be either true or false.

5 > 5
False

Creating variables and assigning them

In this section, we see that you can assign a number to the name variable_1, and then return it by calling that same variable_1 to get the value that you had assigned it:

variable_1 = 100
variable_1
100

Lists

my_list = ['potato', 3, 4.02]
my_list[0]
‘potato’

A list is a data structure in which values can be stored and accessed. The neat thing about lists is that they do not all need to be the same data type. Below, we can see that we have a str, int, and a float, respectively. In order to access the items in the list so that they appear in the output, you can type your list name followed by brackets with the place of the item you want. For example, we append [0] to our listname to return the 1st element in our list as seen below.

For a further exploration into Python lists, see the article Python List Index().

Data types

Python values can have a variety of forms, including ones like int, float, and str. For example, int represents integers, float represents decimal numbers, and str represents “strings”, or more generally text.

Using the type() function, we will be able to check the type of our values. In this case, we’re looking at three values, including 10, 10.06, and the word ‘tutorial’. The first is an integer, while the second is a float, and the last is a string. Below, we can see the command and its respected data type output.

type(10)
<class 'int'>
type(10.06)
<class 'float'>
type('tutorial')
<class 'str'>

For a more in-depth look at data types and how to convert them, you can read the following blog, Python Data Type Conversion Tutorial.

Packages and installation

Packages are powerful tools that have preserved functionalities. For example, pandas is a package that lets you work with tabular data in Python—more on that below. Think of them as apps you download on Python that provide a specific functionality. You will want to exit out of Python in your terminal by typing Ctrl + D, and then typing the following command below (for Mac users). Here we’re installing a sample package called SomePackage just to illustrate the installation.


Installing pandas and Jupyter Notebook

pandas is perhaps the most used data science library in Python. It allows you to work with tabular data in Python and perform a variety of operations. In some sense, you can consider it as the Excel or spreadsheets of Python. The developers of pandas describe it as “a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.”

Jupyter notebook is a web application that interprets code, saves text, and displays visualizations. Both Jupyter and pandas are available to launch if you downloaded Anaconda, or you can install them by running the following commands in terminal:

pip install pandas
pip install notebooks

3. Basic data science with Python

In this section, we’ll cover how to import a popular Python package while assigning it an alias, how to read in a CSV file, how to show your data in something called a DataFrame, and how to visualize a column by using a certain plotting function. We have already learned how to install pandas above, and because Jupyter Notebook is easy to use and works across several computer types, we will use that web application to display the following data science functions with pandas.

You can launch a Jupyter notebook by simply typing jupyter notebook in your terminal.


Importing packages with an alias

Before we begin working with pandas, we need to import it in our environment. Similar to launching an app after installing it, importing a package is simple. We can import pandas and assign an alias to it that is shorter than the original name so that it is easier to work with this package. We will assign pandas as the letters pd instead for shorthand notation—this change is purely optional.

import pandas as pd

Importing your data by reading in a CSV file

After you have imported pandas as pd, you can use pd to read in your data, which can be a CSV file. This function is to enclose the CSV file path in quotes, within the parentheses of pd.read_csv(). You can then store your CSV to a dataframe with the = operation. You will now be able to print out your dataframe, but you will only want to see the first rows instead of all of it. The data, in this case, is from past year’s stock performance with the fields date, open, high, low, close, volume, and name, all of which describe characteristics of daily popular stocks. However, it is important to note that you can use any CSV file here to illustrate importing data.

Dataframe head function

Showing the first few rows of your data is important to check to see if you have imported the correct CSV, as well as seeing if data looks good in general. The way to do this method is by calling your dataframe and attaching the .head() method to the end of it. This function will return the first 5 rows of your data by default. If you want to show more, you can put that value in the parentheses of .head().

import pandas as pd
stocks_df = pd.read.csv('/documents/data/stock_data.csv')
stocks_df.head()


Visualize column with the plot function

The .plot() method is useful for displaying your dataframe’s columns. In this example, I am showing the x-axis as the “date” column, and the “close” column amount as the y-axis, visualized by a line chart.

stocks_df.plot(x="date", y="close")
<AxesSubplot:xlabel='date'>


4. Suggestions for practical applications:

This overview of Python for data science has only just scratched the surface. There are plenty of practical applications that you can employ with Python in data science across several industries, including customer churn, sales forecasting, image classification, product classification, sentiment analysis, credit card fraud detection, and recommendation systems. DataCamp’s Beginner Tutorial: Recommender Systems in Python demonstrates how to successfully recommend movies to people based on a variety of factors with the use of machine learning. Another popular data science use case in the financial industry is credit card fraud detection, which is explained in Credit Card Fraud Detection Using Machine Learning Algorithm. As you can see, not only are there plenty of specific data science use cases, but there are also industry specific use cases.

For a more in-depth introduction to Python, check out DataCamp’s Introduction to Python course, or find out more about learning Python on our blog.