Introduction to Python
Run the hidden code cell below to import the data used in this course.
1 hidden cell
list = compound data type
list slicing fam[n1:n2] n1 is list inclusive n2 is list exclusive
fam[:n] and fam[n:] also work
fam[:] all list elements
Python variables point to somewhere in memory so if you run the below code you will get ['a', 'z', 'c']
x = ["a","b","c"]
y = x
y[1] = "z"
xHowever, if you run the code below you will notice that it behaves differently.
x = ["a","b","c"]
y = list(x)
y[1] = "z"
xadd to list using +
delete use del()
Functions
round(n) with only one input gives an integer
help(round) or ?round opens up the documentation
objects = lists, strings, booleans, integers, floats etc.
methods = functions that belong to objects
call method using . notation e.g. fam.index("mum")
some method calls change the object they're called on while other don't
Packages
packages = directories of Python scripts
each script is a 'module' - modules specify new functions, methods and data types
NumPy - matplotlib - scikit-learn
http://pip.readthedocs.org/en/stable/installing/
download get-pip.py
terminal:
Do this in the terminal:
python3 get-pip.py pip3 install numpy
Then:
import numpy as np np.array([1, 2, 3])
install specific parts of a package
from numpy import array
but import numpy as np is preferred
can import subpackage of a package like so:
from scipy.linalg import inv as my_inv
NumPy
NumPy = numeric python
NumPy array -
- similar to Python list
- also ability to perform calculations over entire arrays
- assumes that arrays contain only one data type -> type coercion causes types to change to end up with homogenous list
- if try to have multiple will return all as strings
- comes with own methods
- can subset using booleans (e.g. > 5 )
Subset using a boolean
e.g. have an exisiting array called bmi
create a boolean subset called light - is bmis less than 21
light = bmi < 21
print(light) -----> [False False True False ... ]
use the boolean subset to choose a subset from the bmi array
bmi(light)
pip3 install numpy
Attributes
e.g. np_2d.shape <--- this is the attribute
call similar to methods but no brackets
2D Numpy arrays
- improved list of lists
- np_2d[0][2] pr np_2d[0, 2] to select the third item in the first row
- np[:,0] --- : indicates to select all rows
- can do summary statistics e.g. np.mean(np_city[:, 0]), np.corrcoef, np.std
gk_heights = np_heights[np_positions == 'GK']
import numpy as np
height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))
print(np_city)# Add your code snippets herex = 11
print(x<5 or x<10)Explore Datasets
Use the arrays imported in the first cell to explore the data and practice your skills!
- Print out the weight of the first ten baseball players.
- What is the median weight of all baseball players in the data?
- Print out the names of all players with a height greater than 80 (heights are in inches).
- Who is taller on average? Baseball players or soccer players? Keep in mind that baseball heights are stored in inches!
- The values in
soccer_shootingare decimals. Convert them to whole numbers (e.g., 0.98 becomes 98). - Do taller players get higher ratings? Calculate the correlation between
soccer_ratingsandsoccer_heightsto find out! - What is the average rating for attacking players (
'A')?
Practice exercises
import numpy as np
x = np.array([0, 4, 4])
for j in x:
print( str(j) + ' km')