You are a data analyst working for a tech company closely monitoring the AI tools market. You want to understand the evolving popularity of three major AI tools, ChatGPT, Gemini, and Microsoft Copilot, and identify which tool is gaining the most traction and how they compare over time.
You'll work with real-world time series data that captures the global interest in these tools over several weeks. Your goal is to analyze this data, uncover patterns, and provide actionable insights to help your company make informed decisions. This might include determining where to focus marketing efforts, predicting future trends, or identifying potential areas for improvement.
Are you ready to help your company stay ahead of the curve in the competitive AI tools market? Let's get started!
The Data
The Google Trends data is available as a CSV file ai_tools_comparison.csv.
The data contains the number of worldwide searches for chatGpt, Gemini, and Microsoft Copilot over the past 12 months as of September 2024.
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
trends = pd.read_csv('ai_tools_comparison.csv')
# Inspect the data
print(trends.head())
print(trends.info())
# change week to datetime date and drop the week column
trends['date'] = pd.to_datetime(trends['week'], format='%Y-%m-%d')
trends = trends.drop('week', axis=1)
# Set date as the index
trends.set_index('date', inplace=True)
#Plot time vs.the three AI apps performance
plt.plot(trends['chatgpt'], label='Chatgpt')
plt.plot(trends['gemini'], label='Gemini')
plt.plot(trends['microsoft_copilot'], label='Microsoft Copilot')
plt.xlabel('Date')
plt.ylabel('AI tool Usage')
plt.title('ChatGPT, Gemini, and Copilot Usage Over Time')
plt.legend()
plt.show()
# Find the percent change for all three numeric columns
growth_rate = trends.pct_change().fillna(0).mul(100)
print(growth_rate.head())
# Calculate the standard deviation of the growth rates for each method
std_dev = growth_rate.std()
print(f'\nStandard deviation of growth rates:\n{std_dev}')
# Utilize standard deviation to identify AI application with minimukm volatility
min_volatility_method = std_dev[std_dev == std_dev.min()].index[0]
print(f'\nAI with most consistent performance: {min_volatility_method}\n')
# Find the worst performance for all three methods
min_growth = growth_rate.min().astype('int')
# Print the worst performances
print('\nWorst performances %:')
print(min_growth)
# Convert to integer for easy equality comparison
growth_rate_int=growth_rate.astype('int')
# Scan columns of growth rate ('chargpt', 'gemini' and 'copilot' for worst performance
# Print date and worst downturn as percent
print('\nWorst downturn per company:')
for col in growth_rate.columns:
dip=growth_rate_int[growth_rate_int[col]==min_growth[col]]
print(f'\n{col}:\ndip % and {dip[col]} ')
# Now find the best performance and the date for all three companies
max_growth=growth_rate.max().astype('int')
print('\nBest performances %')
print(max_growth)
print('\n Best surge per company:')
for col in growth_rate.columns:
surge=growth_rate_int[growth_rate_int[col]==max_growth[col]]
print(f'\n{col}:\nsurge % and {surge[col]}' )
# Find mean monthly average of performances
monthly_data = trends.resample('M').mean()
# Find month with overall best performance
best_month=monthly_data.mean(axis='columns').sort_values(ascending=False).index[0].strftime('%B')
print(f'\nBest month for overall performance: {best_month}')
#Plot monthly averages
monthly_data.plot(subplots=True, ylabel='usage')
plt.suptitle('Montly averages')
plt.show()
print('Comment: Performance surge occurs in Q1&Q2 and starts to slow down or decline in Q3&Q4. Decline is more prominent in Q4 across all companies')
import pandas as pd
# Load the data
trends = pd.read_csv('ai_tools_comparison.csv')
print(trends.info())
print(trends.head())
# Convert 'week' column to datetime
trends['date'] = pd.to_datetime(trends['week'])
print(trends.info())
# Extract year from the date
trends['year'] = trends['date'].dt.year
# Localize the date to 'UTC' timezone
trends['datex']=trends['date'].dt.tz_localize('+0100')
print(trends['datex'].head())
trends['datey']=trends['date'].dt.tz_localize('UTC-0100')
print(trends['datey'].head())
trends['datez']=trends['date'].dt.tz_localize('America/New_York')
print(trends['datez'].head())
trends['datew']=trends['datez'].dt.tz_convert('America/Los_Angeles')
print(trends['datew'].head())