this is the nav!
Price Analysis
• AI Chat
• Code
• Report

#### In this exercise, we will conduct an analysis of a database encompassing the prices of two assets. Our objective is to extract valuable and insightful information from this dataset.

```.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}```# We import the CSV file
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

#Observing the initial records of the table

#### We will change columns names

``````# Getting the columns names
data.columns``````
``````#Changing columns names

columns_names={'fecha':"Date", 'hora': 'Hour', 'pml1':'PML1', 'fecha.1':'Date_2', 'hora.1':'Hour_2', 'pml2':'PML2'}
data.rename(columns=columns_names, inplace=True)
data.columns``````

#### We will conduct observations to identify anomalies or defects in the data.

``````# We utilize the 'describe' function to obtain key statistical metrics.

data.describe()``````

#### In the last table, it is observed that the maximum values of the hour column are 25, so I will discard the values with hour greater than 24.

``````data_not_25hrs = data.drop(data[(data['Hour'] > 24) | (data['Hour_2'] > 24)].index)

# It is observed that some values in the price table have negative values, I am unsure if this is a normal behavior, but I have decided to discard those data points since they are price values.

data_clean = data_not_25hrs.drop(data[(data['PML1'] < 0) | (data['PML2'] < 0)].index)

data_clean[['PML1','PML2']].describe()``````

#### We will conduct an analysis to gain further insights into the distribution of asset prices.

``````# Plot a boxplot of "PML1" and "PML2" price
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 5))

ax1.boxplot(data_clean["PML1"])
ax1.set_title("Boxplot for PML1")
ax1.set_xlabel("PML1")
ax1.set_ylabel("Price")

ax2.boxplot(data_clean["PML2"])
ax2.set_title("Boxplot for PML2")
ax2.set_xlabel("PML2")
ax2.set_ylabel("Price")

# Set common labels and title
fig.suptitle("Price distribution", fontsize=15)

# Show the Figure
plt.show()``````

#### As observed in the preceding graph, there are several values for PML1 and PML2 that can be considered outliers. We will proceed to count how many there are and determine the percentage they represent.

``````# Calculate the interquartile range (IQR)
Q1_pml1 = data_clean['PML1'].quantile(0.25)
Q3_pml1 = data_clean['PML1'].quantile(0.75)
IQR_pml1 = Q3_pml1 - Q1_pml1

Q1_pml2 = data_clean['PML2'].quantile(0.25)
Q3_pml2 = data_clean['PML2'].quantile(0.75)
IQR_pml2 = Q3_pml2 - Q1_pml2

# Define thresholds to consider outliers
lower_threshold_pml1 = Q1_pml1 - 1.5 * IQR_pml1
upper_threshold_pml1 = Q3_pml1 + 1.5 * IQR_pml1

lower_threshold_pml2 = Q1_pml2 - 1.5 * IQR_pml2
upper_threshold_pml2 = Q3_pml2 + 1.5 * IQR_pml2

# Identify outliers
outliers_PML1 = data_clean[(data_clean['PML1'] < lower_threshold_pml1) | (data_clean['PML1'] > upper_threshold_pml1)]
outliers_PML2 = data_clean[(data_clean['PML2'] < lower_threshold_pml2) | (data_clean['PML2'] > upper_threshold_pml2)]

# Count outliers and calculate percentage
count_outliers_PML1 = len(outliers_PML1)
count_outliers_PML2 = len(outliers_PML2)

total_values = len(data_clean)
percentage_outliers_PML1 = (count_outliers_PML1 / total_values) * 100
percentage_outliers_PML2 = (count_outliers_PML2 / total_values) * 100
print(f"For PML1 the interquartile range is: {IQR_pml1}")
print(f"Number of outliers in PML1: {count_outliers_PML1}")
print(f"Percentage of outliers in PML1: {percentage_outliers_PML1:.2f}%")

print(f"For PML2 the interquartile range is: {IQR_pml2}")
print(f"Number of outliers in PML2: {count_outliers_PML2}")
print(f"Percentage of outliers in PML2: {percentage_outliers_PML2:.2f}%")``````