Skip to content

Introduction to Python

Run the hidden code cell below to import the data used in this course.


1 hidden cell

list = compound data type

list slicing fam[n1:n2] n1 is list inclusive n2 is list exclusive

fam[:n] and fam[n:] also work

fam[:] all list elements

Python variables point to somewhere in memory so if you run the below code you will get ['a', 'z', 'c']

x = ["a","b","c"]
y = x 
y[1] = "z"
x

However, if you run the code below you will notice that it behaves differently.

x = ["a","b","c"]
y = list(x)
y[1] = "z"
x

add to list using +

delete use del()

Functions

round(n) with only one input gives an integer

help(round) or ?round opens up the documentation

objects = lists, strings, booleans, integers, floats etc.

methods = functions that belong to objects

call method using . notation e.g. fam.index("mum")

some method calls change the object they're called on while other don't

Packages

packages = directories of Python scripts

each script is a 'module' - modules specify new functions, methods and data types

NumPy - matplotlib - scikit-learn

http://pip.readthedocs.org/en/stable/installing/

download get-pip.py

terminal:

Do this in the terminal:

python3 get-pip.py pip3 install numpy

Then:

import numpy as np np.array([1, 2, 3])

install specific parts of a package

from numpy import array

but import numpy as np is preferred

can import subpackage of a package like so:

from scipy.linalg import inv as my_inv

NumPy

NumPy = numeric python

NumPy array -

  • similar to Python list
  • also ability to perform calculations over entire arrays
  • assumes that arrays contain only one data type -> type coercion causes types to change to end up with homogenous list
  • if try to have multiple will return all as strings
  • comes with own methods
  • can subset using booleans (e.g. > 5 )

Subset using a boolean

e.g. have an exisiting array called bmi

create a boolean subset called light - is bmis less than 21

light = bmi < 21

print(light) -----> [False False True False ... ]

use the boolean subset to choose a subset from the bmi array

bmi(light)

pip3 install numpy

Attributes

e.g. np_2d.shape <--- this is the attribute

call similar to methods but no brackets

2D Numpy arrays

  • improved list of lists
  • np_2d[0][2] pr np_2d[0, 2] to select the third item in the first row
  • np[:,0] --- : indicates to select all rows
  • can do summary statistics e.g. np.mean(np_city[:, 0]), np.corrcoef, np.std

gk_heights = np_heights[np_positions == 'GK']

import numpy as np

height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)

np_city = np.column_stack((height, weight))

print(np_city)
# Add your code snippets here
x = 11
print(x<5 or x<10)

Explore Datasets

Use the arrays imported in the first cell to explore the data and practice your skills!

  • Print out the weight of the first ten baseball players.
  • What is the median weight of all baseball players in the data?
  • Print out the names of all players with a height greater than 80 (heights are in inches).
  • Who is taller on average? Baseball players or soccer players? Keep in mind that baseball heights are stored in inches!
  • The values in soccer_shooting are decimals. Convert them to whole numbers (e.g., 0.98 becomes 98).
  • Do taller players get higher ratings? Calculate the correlation between soccer_ratings and soccer_heights to find out!
  • What is the average rating for attacking players ('A')?
Spinner
DataFrameas
df
variable

Practice exercises

import numpy as np
x = np.array([0, 4, 4])
for j in x:
    print( str(j) + ' km')