Competition - Loan Data

(Invalid URL)

Loan Data

Ready to put your coding skills to the test? Join us for our Workspace Competition.
For more information, visit datacamp.com/workspacecompetition

Context

This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from lendingclub.com which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.

# Load packages
import numpy as np 
import pandas as pd 
import seaborn as sb
import matplotlib.pyplot as plt
%matplotlib inline

Load your data

# Load data from the csv file
df = pd.read_csv('loan_data.csv', index_col=None)

# Change the dots in the column names to underscores
df.columns = [c.replace(".", "_") for c in df.columns]
print(f"Number of rows/records: {df.shape[0]}")
print(f"Number of columns/variables: {df.shape[1]}")
df.head()

Understand your variables

# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])

for i, var in enumerate(df.columns):
    variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
    
# Join with the variables dataframe
var_dict = pd.read_csv('variable_explanation.csv', index_col=0)
variables.set_index('Variable').join(var_dict)

Now you can start to explore this dataset with the chance to win incredible prices! Can't think of where to start? Try your hand at these suggestions:

Extract useful insights and visualize them in the most interesting way possible.
Find out how long it takes for users to pay back their loan.
Build a model that can predict the probability a user will be able to pay back their loan within a certain period.
Find out what kind of people take a loan for what purposes.

# Start coding \

df.info()

df.describe()

print("total installments :",df['installment'].sum())

print("shape of the  data frame :",df.shape)
print("no. of loans taken :",df.shape[0])

print("no. of loans that are not fully paid",df['not_fully_paid'].sum())

df.corr()
sb.heatmap(df.corr())

‌
‌
‌

Competition - Loan Data

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Loan Data

Context

Load your data

Understand your variables

Loan Data