Skip to main content

Fraud Detection in R

Learn to detect fraud with analytics in R.

Start Course for Free
4 Hours16 Videos49 Exercises5,677 Learners3900 XP

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

The Association of Certified Fraud Examiners estimates that fraud costs organizations worldwide $3.7 trillion a year and that a typical company loses five percent of annual revenue due to fraud. Fraud attempts are expected to even increase further in future, making fraud detection highly necessary in most industries. This course will show how learning fraud patterns from historical data can be used to fight fraud. Some techniques from robust statistics and digit analysis are presented to detect unusual observations that are likely associated with fraud. Two main challenges when building a supervised tool for fraud detection are the imbalance or skewness of the data and the various costs for different types of misclassification. We present techniques to solve these issues and focus on artificial and real datasets from a wide variety of fraud applications.

  1. 1

    Introduction & Motivation


    This chapter will first give a formal definition of fraud. You will then learn how to detect anomalies in the type of payment methods used or the time these payments are made to flag suspicious transactions.

    Play Chapter Now
    Introduction & Motivation
    50 xp
    Imbalanced class distribution
    100 xp
    Cost of not detecting fraud
    100 xp
    Time features
    50 xp
    Circular histogram
    100 xp
    Suspicious timestamps
    100 xp
    Frequency features
    50 xp
    Frequency feature for one account
    100 xp
    Frequency feature for multiple accounts
    100 xp
    Recency features
    50 xp
    Recency feature
    100 xp
    Comparing frequency & recency
    100 xp
  2. 3

    Imbalanced class distributions

    Fortunately, fraud occurrences are rare. However, this means that you're working with imbalanced data, which if left as is will bias your detection models. In this chapter, you will tackle imbalance using over and under-sampling methods.

    Play Chapter Now


Chapter 1 datasetsChapter 2 datasetsChapter 3 datasetsChapter 4 datasets


chesterChester Ismayhadrien-d4e73b49-bc29-46b7-a485-2f598f38e3b9Hadrien Lacroixsara-billenSara Billen
Bart Baesens Headshot

Bart Baesens

Professor in Analytics and Data Science at KU Leuven

Bart Baesens is professor in Analytics and Data Science at the Faculty of Economics and Business of KU Leuven, and a lecturer at the University of Southampton (UK). He has done extensive research on big data & analytics, credit risk analytics and fraud analytics. He regularly tutors, advises and provides consulting support to international firms with respect to their big data, analytics and fraud & credit risk management strategy.
See More
Sebastiaan Höppner Headshot

Sebastiaan Höppner

PhD researcher in Data Science at KU Leuven

Sebastiaan Höppner is a PhD researcher at the Section of Statistics and Data Science of the Departement of Mathematics at KU Leuven (Belgium). His research is mainly focused on developing new statistical tools and machine learning models that are capable of detecting credit transfer fraud.
See More
Tim Verdonck Headshot

Tim Verdonck

Professor at KU Leuven

Tim Verdonck is a professor in Statistics and Data Science at the Department of Mathematics of KU Leuven (Belgium). He is also a visiting professor at the School of Economics, Management and Statistics at the University of Bologna (Italy), where he gives a course in the Master in Quantitative Finance. He is chairholder of the BNP Paribas Fortis Chair in Fraud Analytics, which investigates the use of predictive analytics in the context of payment fraud. Tim Verdonck is also chairholder of the Allianz Chair Prescriptive Business Analytics in Insurance. His research interests are in the development and application of robust statistical methods for financial, actuarial and economic data sets.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA