Skip to content

Intermediate Python

Run the hidden code cell below to import the data used in this course.

Matplotlib

  • import matplotlib.pyplot as plt (to import the module)

- plt.plot(x-axis, y-axis) (to plot the data(normal curve)) to build a line plot, when you have a time on the h axis

  • plt.show() displays the plot

- plt.scatter(x,y) used to assess if there's a correlation between two variables

- plt.hist(one variable) explore the distribution within the dataset of the variable

  • bins argument of plt.hist() to control the number of bins of the histogram
  • plt.clf() cleans up the diagram so you can start afresh
  • xlabel() to give a label to the x axis
  • ylabel() to give a label to the y axis
  • title() to give a title to the plot
  • plt.yticks([0,1,2], ["one","two","three"])/plt.xticks([0,1,2], ["one","two","three"])
  • alpha= argument (a number between 0 and 1) to change the opacity of the diagram
  • plt.grid(True) add a grid to the diagram
  • plt.text(x-position, y-position, 'text to be added') to add a text/comment within the diagram to highlight sth

Dictionaries

my_dict = { "key1":"value1", "key2":"value2", }

  • indexed by unique keys so u can lookup table content by them
  • index(item name): to find the index of an item within a list
  • my_dict.keys(): print all the unique keys of the dictionary my_dict

PANDAS

  • Dictionary to DataFrame: pd.DataFrame(arg)
  • dictname.index = list: use to specify the labels to each row in the data base
  • var = pd.read_csv("imported DataFrame") do not forget to insert the path as a string!
  • index_col: an argument inside the read_csv() to specify which column to use as row labels/ can take 0 as argument
  • cars['cars_per_cap']
  • cars[['cars_per_cap']]
  • The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.
  • loc is label based: you need the specify the label of each column and/or row
  • iloc is index based: it is what it is
  • we use [] to extract the data in the form of a series of objects
  • we use [[]] to extract the data in the form of a DataFrame

Random notes

  • When you write these comparisons in a script, you will need to wrap a print() function around them to see the output.

comparison with np arrays

  • np.logical_and(): and comparison of two np arrays
  • np.logical_or(): or comparison of two np arrays
  • np.logical_not(): not comaparison of two np arrays
  • If you also want to access the index information, so where the list element you're iterating over is located, you can use enumerate().
  • items(): a method to loop over a dictionary, you should call the key variable within the for loop
  • np.nditer(array_name): a function to loop within an NumPy array
  • iterrows(): a method to iterate over a DataFrame, you should mention the row variable within the for function
  • apply(): a combination of the for and the iterrows, much easier when performing changes in the DataFrame
  • np.transpose(): a function that returns an array with axes transposed.

Add your notes here

# Add your code snippets here

ONCE YOU KNOW THE DISTRIBUTION, YOU CAN CALCULATE CHANCES.