Ecological Impact Focus
As a student of Environmental Science & Management at Portland State University, data management and modeling are required skills, and I find it incredibly helpful to work through what I'm learning in code. Jupyter Notebooks are an awesome tool for that!
After coding through a problem, my comprehension is a lot more thorough and my logic is far more firmly founded; also, my coding skills grow and stay sharp in the process.
This workspace is the notebook repo for my ESM projects. Due to DataCamp workspace server design, only one 'notebook.ipynb' is publicly visble per workspace, and organizing projects is easiest if I group by workspace; ergo, new projects are kept invisible and presentable projects are cyclicly featured here.
Applied Environmental Studies
ESM 221 with Professor Arick Rouhe
TA Christian Heisler
Linear Regression Model
Line of Best Fit (Ordinary Least Squares)
import pandas as pd
import statsmodels.api as sm
import plotly.express as px
import plotly.graph_objects as go
def calculate_linear_regression(dataframe, test_type=2):
# Drop rows with NaN values
dataframe = dataframe.dropna()
if test_type == 1:
# Define the independent variable for one-way test
X = dataframe[['Temperature (°C)']]
elif test_type == 2:
# Define the independent variables for two-way test
X = dataframe[['Temperature (°C)']]
else:
raise ValueError("test_type must be either 1 (one-way) or 2 (two-way)")
X = sm.add_constant(X) # Add a constant term to the independent variables
# Define the dependent variable
y = dataframe['Corvus Factor']
# Perform the linear regression
model = sm.OLS(y, X).fit()
return model
outlier = input("Show the outlier Y or N:").upper()
if outlier == 'N':
data = {
'Temperature (°C)': [11, 11.1, 6.2, 8.87, 11, 12.3, 14.85, 2.35],
'Corvus Factor': [0.33, 4.29, 3, 0, 2, 2.17, 0, 0]
}
elif outlier == 'Y':
data = {
'Temperature (°C)': [7,11, 11.1, 6.2, 8.87, 11, 12.3, 14.85, 2.35],
'Corvus Factor': [26.67, 0.33, 4.29, 3, 0, 2, 2.17, 0, 0]
}
else:
print("There was a misentry. Rerun the program and enter Y or N.")
# Example usage:
df = pd.DataFrame(data).dropna() # Drop rows with NaN values in either column
model = calculate_linear_regression(df, test_type=2) # Change test_type to 1 for one-way test
# Plotting with Plotly
fig = px.scatter(df, x='Temperature (°C)', y='Corvus Factor', trendline="ols")
fig.update_layout(title='Corvus Factor ~ Temperature (°C)')
# Extracting regression line equation, r-squared, f-statistic, and p-value
params = model.params
r_squared = model.rsquared
f_statistic = model.fvalue
p_value = model.f_pvalue
line_eq = f"y = {params[1]:.3f}x + {params[0]:.3f}"
r2_text = f"R² = {r_squared:.3f}"
f_text =f"F-statistic = {f_statistic:.3f}"
p_text =f"p-value = {p_value:.3g}"
# Adding annotations for the linear equation and stats
fig.add_annotation(xref="paper", yref="paper", x=0.05, y=0.96, showarrow=False,text=line_eq, font=dict(size=14, color="black"))
fig.add_annotation(xref="paper", yref="paper", x=0.05, y=0.90, showarrow=False,text=r2_text, font=dict(size=14, color="black"))
fig.add_annotation(xref="paper", yref="paper", x=0.05, y=0.84, showarrow=False,text=f_text, font=dict(size=14, color="black"))
fig.add_annotation(xref="paper", yref="paper", x=0.05, y=0.78, showarrow=False,text=p_text, font=dict(size=14, color="black"))
fig.show()
model.summary()
Research Methods for Environmental Science
ESM 340 with Professor Amy Larson
TA Christian Heisler
Binomial Distributions
Here is a binomial table class constructor in Python with a method to check test statistics and an example setup for running a list of test values against a list of sample sizes. It uses combinations from the Python math module to calculate associated probability.
Statistical Functions are available in many modules
Here the binom methods of the scipy.stats model are given the same inputs.
The critical values are the values as close to the Expected Relative Frequencies mean average with enough statistical significance to reject the null hypothesis at 0.05 alpha, meaning just a 5% chance of getting a value farther from the mean if the distribution being observed is actually no different than the expected distribution (50% probability in a binomial test).
Obtaining critical values (or more extreme values) indicates the variability being investigated is significantly defying the odds of random chance.
If the null hypothesis is rejected, an alternative hypothesis (a possible cause of the nonchance occurence) is advanced; hypotheses can be disproven and cannot be proven.
Scientific consensus advances by disproving hypotheses rather than proving them; the process leaves best guesses (theories), after scrutinous tests, and disproves lesser guesses.
The Random Module in Python
This generates a number of random numbers in a range.
This demonstrates random sampling of a pandas DataFrame.