Skip to content

Data analysis for Travel Assured

A travel insurance company

What is the objective? Due to COVID, marketing campaign has been reduced by 50% and now it is more important that ever that they advertise to right audience in the right place.

For this reason, the company stackholders want to have an answer to the following two business questions based on the data provided.

  1. Are there differences in the travel habits between customers and non-customers?
  2. What is the typical profile of customers and non-customers?

Data Validation

The data provided must contain 9 columns with the correspoding data types as described below:

Let's validate!

# Importing python modules for the analysis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# loading dataset
data = pd.read_csv('travel_insurance.csv')

# Showing first 5 records
data.head(5)
# Showing last 5 records
data.tail(5)
# Showing dataframe info
data.info()

Findings:

  • No null values for the columns
  • 9 colums as expected
  • Correct data type for each columns: numeric or character

Now that we are sure that the data is in good shape for analsysis let's start to explore the characteristics of each feature/column, in order to find patterns and relationships.

Exploratory Analysis

In this part of the analysis process I will get to know the data better by looking for

# Showing statistics about numeric columns
data.describe()

Findings

Below we can find a bullet point with the characteristic, plus a graph so, it is easier to understand the concept

NUMERICAL FEATURES

palette = sns.color_palette(['#004F5F', '#F9F871','#38C7A6' ,'#36E9FE' ,'#766AAF', '#9b6973', '#ce7ea2','#ffe28a'])
sns.set_palette(palette)

datalabel_color = 'white'
  • Median age is 29 and the range goes from 25 to 35 years old
sns.boxplot(data=data, x='Age')
plt.title('Age Distribution')
plt.xticks(np.arange(np.min(data.Age), np.max(data.Age) + 1))
plt.savefig('Age Distribution of Population.png')
plt.show()
  • Minimun Annual income is 300,000 (currency is not defined) with a median of 900,000