This hands-on-course with real-life credit data will teach you how to model credit risk by using logistic regression and decision trees in R. Modeling credit risk for both personal and company loans is of major importance for banks. The probability that a debtor will default is a key component in getting to a measure for credit risk. While other models will be introduced in this course as well, you will learn about two model types that are often used in the credit scoring context; logistic regression and decision trees. You will learn how to use them in this particular context, and how these models are evaluated by banks.
Introduction and data preprocessingFree
This chapter begins with a general introduction to credit risk models. We'll explore a real-life data set, then preprocess the data set such that it's in the appropriate format before applying the credit risk models.Introduction and data structure50 xpExploring the credit data100 xpInterpreting a CrossTable()50 xpHistograms and outliers50 xpHistograms100 xpOutliers100 xpMissing data and coarse classification50 xpDeleting missing data100 xpReplacing missing data100 xpKeeping missing data100 xpData splitting and confusion matrices50 xpSplitting the data set100 xpCreating a confusion matrix100 xp
Logistic regression is still a widely used method in credit risk modeling. In this chapter, you will learn how to apply logistic regression models on credit data in R.Logistic regression: introduction50 xpBasic logistic regression100 xpInterpreting the odds for a categorical variable50 xpMultiple variables in a logistic regression model100 xpInterpreting significance levels50 xpLogistic regression: predicting the probability of default50 xpPredicting the probability of default100 xpMaking more discriminative models100 xpEvaluating the logistic regression model result50 xpSpecifying a cut-off100 xpComparing two cut-offs50 xpWrap-up and remarks50 xpComparing link functions for a given cut-off100 xp
Classification trees are another popular method in the world of credit risk modeling. In this chapter, you will learn how to build classification trees using credit data in R.What is a decision tree?50 xpComputing the gain for a tree100 xpChanging one Gini...50 xpBuilding decision trees using the rpart()-package50 xpUndersampling the training set100 xpChanging the prior probabilities100 xpIncluding a loss matrix100 xpPruning the decision tree50 xpPruning the tree with changed prior probabilities100 xpPruning the tree with the loss matrix100 xpOther tree options and the construction of confusion matrices50 xpOne final tree using more options100 xpConfusion matrices and accuracy of our final trees100 xpOptimizing the accuracy50 xp
Evaluating a credit risk model
In this chapter, you'll learn how you can evaluate and compare the results obtained through several credit risk models.Finding the right cut-off: the strategy curve50 xpComputing a bad rate given a fixed acceptance rate100 xpThe strategy table and strategy curve100 xpTo tree or not to tree?50 xpThe ROC-curve50 xpROC-curves for comparison of logistic regression models100 xpROC-curves for comparison of tree-based models100 xpInput selection based on the AUC50 xpAnother round of pruning based on AUC100 xpBest of four50 xpFurther model reduction?100 xpCourse wrap-up50 xp
PrerequisitesIntermediate R for Finance
Lore DirickSee More
Director of Data Science Education at Flatiron School
Lore is a data scientist with expertise in applied finance. She obtained her PhD in Business Economics and Statistics at KU Leuven, Belgium. During her PhD, she collaborated with several banks working on advanced methods for the analysis of credit risk data. Lore formerly worked as a Data Science Curriculum Lead at DataCamp, and is and is now Director of Data Science Education at Flatiron School, a coding school with branches in 8 cities and online programs.