Premium project

Which Debts Are Worth the Bank's Effort?

Play bank data scientist and use regression discontinuity to see which debts are worth collecting.

Start Project
9 Tasks1,500 XP

Loved by learners at thousands of companies


Project Description

After a debt has been legally declared "uncollectable" by a bank, the account is considered to be "charged-off." But that doesn't mean the bank simply **walks away** from the debt. They still want to collect some of the money they are owed. In this project, you will look at a situation where a bank assigned delinquent customers to different recovery strategies based on the expected amount the bank believed it would recover from the customer. The goal for the data scientist is to determine in this non-random assignment whether the incremental amount the bank earns exceeded the additional cost of assigning customers to a higher recovery strategy. Threshold assignments like this also one occur in medicine (above a certain temperature you get medicine), education (above a certain test score students get admitted to a special class), other areas of finance (above a certain wealth customers get different levels of service), and public sector (below a certain income someone is eligible for housing benefits). Regression discontinuity is an intuitive and useful analysis method in any situation of a threshold assignment.

Project Tasks

  1. 1
    Regression discontinuity: banking recovery
  2. 2
    Graphical exploratory data analysis
  3. 3
    Statistical test: age vs. expected recovery amount
  4. 4
    Statistical test: sex vs. expected recovery amount
  5. 5
    Exploratory graphical analysis: recovery amount
  6. 6
    Statistical analysis: recovery amount
  7. 7
    Regression modeling: no threshold
  8. 8
    Regression modeling: adding true threshold
  9. 9
    Regression modeling: adjusting the window
Technologies
Python Python
Topics
Data ManipulationData VisualizationProbability & StatisticsImporting & Cleaning Data
Howard Friedman Headshot

Howard Friedman

Adjunct Professor at Columbia University
Howard has a Masters in Statistics and Ph.D. in Biomedical Engineering. He served as a Director leading data modeling teams at Capital One and as an entrepreneur has started numerous companies in data-related areas. He has nearly 20 years of experience in data-driven value creation in the public sector, for private equity firms, Fortune 500 companies, and smaller firms. He teaches data analytics classes at Columbia University and is the Chief Data Scientist at DataMed Solutions. His book Measure of a Nation, a data-driven approach to policy recommendations, was identified in the NY Times as one of the year's best books.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA