Premium Project

Which Debts Are Worth the Bank's Effort?

Play bank data scientist and use regression discontinuity to see which debts are worth collecting.

Start Project
  • 9 tasks
  • 2,158 participants
  • 1,500 XP

Project Description

After a debt has been legally declared "uncollectable" by a bank, the account is considered to be "charged-off." But that doesn't mean the bank simply walks away from the debt. They still want to collect some of the money they are owed. In this project, you will look at a situation where a bank assigned delinquent customers to different recovery strategies based on the expected amount the bank believed it would recover from the customer. The goal for the data scientist is to determine in this non-random assignment whether the incremental amount the bank earns exceeded the additional cost of assigning customers to a higher recovery strategy.

Threshold assignments like this also one occur in medicine (above a certain temperature you get medicine), education (above a certain test score students get admitted to a special class), other areas of finance (above a certain wealth customers get different levels of service), and public sector (below a certain income someone is eligible for housing benefits). Regression discontinuity is an intuitive and useful analysis method in any situation of a threshold assignment.

This project lets you apply the skills from Data Manipulation with pandas, including reading, exploring, filtering, and grouping data. This project also uses basic statistics, where an intro to statistics course like Statistical Thinking in Python is useful.

Project Tasks

  • 1Regression discontinuity: banking recovery
  • 2Graphical exploratory data analysis
  • 3Statistical test: age vs. expected recovery amount
  • 4Statistical test: sex vs. expected recovery amount
  • 5Exploratory graphical analysis: recovery amount
  • 6Statistical analysis: recovery amount
  • 7Regression modeling: no threshold
  • 8Regression modeling: adding true threshold
  • 9Regression modeling: adjusting the window
Howard Friedman

Adjunct Professor at Columbia University

Howard has a Masters in Statistics and Ph.D. in Biomedical Engineering. He served as a Director leading data modeling teams at Capital One and as an entrepreneur has started numerous companies in data-related areas. He has nearly 20 years of experience in data-driven value creation in the public sector, for private equity firms, Fortune 500 companies, and smaller firms. He teaches data analytics classes at Columbia University and is the Chief Data Scientist at DataMed Solutions. His book Measure of a Nation, a data-driven approach to policy recommendations, was identified in the NY Times as one of the year's best books.

See More


  • Python LogoPython
  • Topics

    Data ManipulationData VisualizationProbability & StatisticsImporting & Cleaning Data