Skip to content
Intermediate Python Notes
Intermediate Python
Run the hidden code cell below to import the data used in this course.
Matplotlib
- import matplotlib.pyplot as plt (to import the module)
- plt.plot(x-axis, y-axis) (to plot the data(normal curve)) to build a line plot, when you have a time on the h axis
- plt.show() displays the plot
- plt.scatter(x,y) used to assess if there's a correlation between two variables
- plt.hist(one variable) explore the distribution within the dataset of the variable
- bins argument of plt.hist() to control the number of bins of the histogram
- plt.clf() cleans up the diagram so you can start afresh
- xlabel() to give a label to the x axis
- ylabel() to give a label to the y axis
- title() to give a title to the plot
- plt.yticks([0,1,2], ["one","two","three"])/plt.xticks([0,1,2], ["one","two","three"])
- alpha= argument (a number between 0 and 1) to change the opacity of the diagram
- plt.grid(True) add a grid to the diagram
- plt.text(x-position, y-position, 'text to be added') to add a text/comment within the diagram to highlight sth
Dictionaries
my_dict = { "key1":"value1", "key2":"value2", }
- indexed by unique keys so u can lookup table content by them
- index(item name): to find the index of an item within a list
- my_dict.keys(): print all the unique keys of the dictionary my_dict
PANDAS
- Dictionary to DataFrame: pd.DataFrame(arg)
- dictname.index = list: use to specify the labels to each row in the data base
- var = pd.read_csv("imported DataFrame") do not forget to insert the path as a string!
- index_col: an argument inside the read_csv() to specify which column to use as row labels/ can take 0 as argument
- cars['cars_per_cap']
- cars[['cars_per_cap']]
- The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.
- loc is label based: you need the specify the label of each column and/or row
- iloc is index based: it is what it is
- we use [] to extract the data in the form of a series of objects
- we use [[]] to extract the data in the form of a DataFrame
Random notes
- When you write these comparisons in a script, you will need to wrap a print() function around them to see the output.
comparison with np arrays
- np.logical_and(): and comparison of two np arrays
- np.logical_or(): or comparison of two np arrays
- np.logical_not(): not comaparison of two np arrays
- If you also want to access the index information, so where the list element you're iterating over is located, you can use enumerate().
- items(): a method to loop over a dictionary, you should call the key variable within the for loop
- np.nditer(array_name): a function to loop within an NumPy array
- iterrows(): a method to iterate over a DataFrame, you should mention the row variable within the for function
- apply(): a combination of the for and the iterrows, much easier when performing changes in the DataFrame
- np.transpose(): a function that returns an array with axes transposed.
Add your notes here
# Add your code snippets hereONCE YOU KNOW THE DISTRIBUTION, YOU CAN CALCULATE CHANCES.