Skip to content
Autism vs Autism Spectrum Disorder RSV Analysis Part 1
Autism vs Autism Spectrum RSV Analysis Part 1
This notebook is Part 1 of the analysis of Google searches in different languages for "Autism" and "Autism spectrum disorder". Part 1 focuses on:
- data cleaning
- df merging
Reading data in
I downloaded a series of google trends data in order to compare the interest of people for "autism spectrum disroder" and "autism" in different languages.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
For the convenience of handling multiple data files, I createed a function, that will help me read in the data more easily
def scrape_filenames():
'''
lists the names of uploaded files
cleans them and returns a list of file names
'''
string_list = !ls
file_list = []
for s in string_list:
s = s.replace(' ', ',').replace(' ', ',').replace('\t', ',')
file_list.extend(s.split(','))
file_list = [s for s in file_list if len(s)>0]
file_list = [s.strip() for s in file_list]
return file_list
file_list = scrape_filenames()_short.csv - contains RSV data for "Autism" in a given language -long.csv - contain RSV data dfor "Autism Spectrum Disorder" in a given language I used wikipedia to search for the translations of both terms
print(file_list)Cleaning and merging individual languages
def read_batch(file_list):
'''
takes a list of file names
joins them into a path
and returns the data frame
'''
all_languages = pd.read_csv('arabic_long.csv', skiprows = 2, parse_dates=['Miesiąc'])
language_names = []
for name in file_list:
path = name
if (path == 'autism_language.ipynb')| (path == 'notebook.ipynb'):
continue
col_name = name.split('.')[0]
df = pd.read_csv(path, skiprows = 2, parse_dates=['Miesiąc'])
df = df.rename(columns = {df.columns[1]:col_name})
all_languages = all_languages.merge(df, how='outer', on='Miesiąc')
all_languages = all_languages.rename(columns = {'Miesiąc': 'date'})
all_languages.drop(columns = 'طيف التوحد: (Cały świat)', inplace = True)
return all_languagesdf_lang = read_batch(file_list)
df_lang.head()df_lang.set_index('date', inplace=True)# visuals and statistics for english short
english_short = df_lang['english_short']
english_long = df_lang['english_long']