Visualize Religious Traditions in the U.S.

Visualize Religious Traditions in the US

Do religious traditions in the United States go hand in hand with a political and social ideology? In other words, do fervently religious communities tend to be more conservative? To answer this research question I will compare survey results from a moderate-liberal state like Michigan in the U.S. Midwest vs a conservative one like Georgia in the U.S. South.

Summary

In this analysis I explore survey results from the U.S. Religious Landscape Study" by the Pew Research Center.

Based on their culture and voting preferences, communities in the state of Georgia have a more conservative political and social ideology than people in Michigan. At the same time, conservative religious traditions are more typical in the south.

The visualization in this notebook compares ideology in the states of Georgia and Michigan and how it has changed between 2007 and 2014. Then it compares religious traditions in Georgia vs. Michigan and shows that Georgia has a much larger proportion of people who identify themselves with the Evangelical Protestant church, which promotes very conservative values and views. Finally I compute a chi-square test of independence between ideology and religious traditions.

The conclusion of this analysis is that religious traditions have a significant influence in political and social views i.e. people's ideology.

Data files:

2007 survey dataset

2014 survey dataset

# Religious Traditions in the US (Michigan vs. Georgia)
# Source: 'U.S. Religous Landscape Study', Pew Research Center, data collected for 2007 and 2014

import numpy as np
import pandas as pd
import json
# import pyreadstat   # to read SPSS files from Pew Research Center
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from scipy.stats import chi2_contingency
from scipy.stats import chi2

# Import data for religion surveys in 2007 and 2014
# Module to read SPSS files is not available, use csv and json files instead
df07 = pd.read_csv('rel_survey_2007.csv')
df14 = pd.read_csv('rel_survey_2014.csv')

# Load survey question metadata
f07 = open('meta07_dict.json')
f14 = open('meta14_dict.json')
meta07_dict = json.load(f07)
meta14_dict = json.load(f14)
f07.close()
f14.close()

# Select survey questions of interest
# List of religious traditions
print('2007: ' + meta07_dict['reltrad'])
print('2014: ' + meta14_dict['RELTRAD'])
print('------------------')

# Importance of religion
print('2007: ' + meta07_dict['q21']) 
print('2014: ' + meta14_dict['qf2']) 
print('------------------')

# Political party
print('2007: ' + meta07_dict['party']) 
print('2014: ' + meta14_dict['party']) 
print('------------------')

# Ideology
print('2007: ' + meta07_dict['ideo'])  
print('2014: ' + meta14_dict['ideo'])  
print('------------------')

# Subset dataframes with questions of interest and additional fields (state and weight)
col_2007 = ['state', 'weight', 'reltrad', 'q21', 'party', 'ideo']
col_2014 = ['state', 'WEIGHT', 'RELTRAD', 'qf2', 'party', 'ideo']
df07c = df07[col_2007].copy()
df14c = df14[col_2014].copy()

# Rename columns
df14c.rename(columns = {'WEIGHT' : 'weight'
                       ,'RELTRAD' : 'reltrad'
                       ,'qf2' : 'q21'
                     }, inplace = True)

# Add year
df07c['year'] = 2007
df14c['year'] = 2014

# Combine data
religion = pd.concat([df07c, df14c])
religion.head()

Mapping survey response codes to labels using the survey codebook i.e. data dictionary

Because survey responses are coded, we need a dictionary to map code numbers to text values.

# Load data dictionary
fs = open('s14_dict.json')
s14_dict = json.load(fs)
fs.close()

# Look at code:value pairs for political parties:
s14_dict['s14_party_dict']

# Look at code:value pairs for religious traditions:
s14_dict['s14_reltrad_dict']

# Map response codes to labels
religion['state'] = religion['state'].astype('str').map(s14_dict['s14_state_dict'])
religion['religion'] = religion['reltrad'].astype('str').map(s14_dict['s14_reltrad_dict'])
religion['fervor'] = religion['q21'].astype('str').map(s14_dict['s14_fervor_dict'])
religion['pol_party'] = religion['party'].astype('str').map(s14_dict['s14_party_dict'])
religion['ideology'] = religion['ideo'].astype('str').map(s14_dict['s14_ideo_dict'])

# Adjust ideology
religion['ideology'] = religion['ideo'].astype('int').astype('str')+'-'+religion['ideology']

# Subset columns
cols = ['year', 'state', 'religion', 'fervor', 'pol_party', 'ideology', 'weight']

# Combine data
religion = religion[cols].copy()

# Capitalize column names
religion.columns = [x.capitalize() for x in religion.columns]

religion.head()

Using weights in survey data

In surveys, it is often found out that the distribution of sociodemographic characteristics does not correspond to the distribution in the customer base; therefore weights are needed in order to adjust frequency counts.

According to the Pew Research Center codebook: Analysts interested in national-level and state-level data – including those interested in subgroups within the nation or states should weight the data using the variable WEIGHT.

These weights come from the Census Bureau’s 2012 American Community Survey (ACS) one-year estimates.

To show an example, in the cell below I will compare straight vs. weighted percentages by Ideology in Michigan.

# Remove "Don't know" from ideology bc it's a small group.
idx = religion['Ideology'] != "9-(VOL) Don't know/Refused"
religion = religion[idx]

# Roll up by year and ideology
state_grp = religion.groupby(['Year', 'State'])

# Get ideology in Michigan in 2014 with straight value counts
mi14 = state_grp['Ideology'].value_counts(normalize = True).loc[(2014, 'Michigan')].mul(100).rename('Percentage').reset_index()
mi14.sort_values(by = 'Ideology', inplace = True)

# Fix indexing
mi14 = mi14.set_index('Ideology')

# Get ideology in Michigan in 2014 with WEIGHTED value counts
mi14_weighted = (state_grp.get_group((2014,'Michigan'))[['Weight', 'Ideology']].groupby('Ideology').sum()).mul(100) / (state_grp.get_group((2014,'Michigan'))['Weight'].sum())

mi14_weighted.rename(columns = {'Weight' : 'Percentage'}, inplace = True)

print(mi14)
print(mi14_weighted)

# Add second dataframe that groups Very Conservative + Convervative and Liberal + Very Liberal for simplicity
rsimple = religion.copy()
rsimple.loc[(rsimple['Ideology']=='1-Very conservative') | (rsimple['Ideology']=='2-Conservative'),'Ideology'] = '1-Conservative'
rsimple.loc[(rsimple['Ideology']=='4-Liberal, OR') | (rsimple['Ideology']=='5-Very liberal?'),'Ideology'] = '3-Liberal'
rsimple.loc[rsimple['Ideology']=='3-Moderate','Ideology'] = '2-Moderate'

# Roll up by year and ideology
state_grp_simple = rsimple.groupby(['Year', 'State'])

Ideology Trends in Georgia and Michigan

Data is now ready for analysis. Let's explore how ideology has changed in each state between 2007 and 2014. Remember that these are the only two years for which we have data. After looking at trends in each state, let's compare each state with the most recent survey data to check which state is more liberal or more conservative.

# Create dataframe with ideology survey results as % for each state  -- Grouped ideology
iddfx = pd.DataFrame()
for group, frame in state_grp_simple:
    df = frame
    df = (df[['Weight', 'Ideology']].groupby('Ideology').sum().mul(100)/df['Weight'].sum())
    df.rename(columns = {'Weight' : 'Percentage'}, inplace = True)
    df['Ideology'] = df.index
    df['Year'] = group[0]
    df['State'] = group[1]
    iddfx = pd.concat([iddfx,df])

# Switch format to string
iddfx['Year'] = iddfx['Year'].apply(str)

# Switch index to Year
iddfx.set_index('Year', inplace = True)

# Plot change in ideology between 2007 and 2014
color_map={
    "1-Conservative": "red",
    "2-Moderate": "gray",
    "3-Liberal": "lightblue"}

p = sns.relplot(data = iddfx
             ,kind = 'line'
             ,x = 'Year'
             ,y = 'Percentage'
             ,hue = 'Ideology'
             ,col = 'State'
             ,marker = 'o'
             ,markersize = 12
             ,height=4, aspect=1.5, palette = color_map, linewidth = 4
            ).set(xlabel = None
                 )
p.set_axis_labels('Year', 'Percentage', fontsize = 14)


ticks = p.axes[0][0].get_yticks()
ylabels = ['{:,.0f}%'.format(x) for x in ticks]
p.set_yticklabels(ylabels)
plt.show()