Interactive Course

Machine Learning with the Experts: School Budgets

Learn how to build a model to automatically classify items in a school budget.

  • 4 hours
  • 15 Videos
  • 51 Exercises
  • 28,554 Participants
  • 3,800 XP

Loved by learners at thousands of top companies:

deloitte-grey.svg
credit-suisse-grey.svg
mls-grey.svg
ea-grey.svg
intel-grey.svg
3m-grey.svg

Course Description

Data science isn't just for predicting ad-clicks-it's also useful for social impact! This course is a case study from a machine learning competition on DrivenData. You'll explore a problem related to school district budgeting. By building a model to automatically classify items in a school's budget, it makes it easier and faster for schools to compare their spending with other schools. In this course, you'll begin by building a baseline model that is a simple, first-pass approach. In particular, you'll do some natural language processing to prepare the budgets for modeling. Next, you'll have the opportunity to try your own techniques and see how they compare to participants from the competition. Finally, you'll see how the winner was able to combine a number of expert techniques to build the most accurate model.

  1. 1

    Exploring the raw data

    Free

    In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.

  2. Improving your model

    Here, you'll improve on your benchmark model using pipelines. Because the budget consists of both text and numeric data, you'll learn to how build pipielines that process multiple types of data. You'll also explore how the flexibility of the pipeline workflow makes testing different approaches efficient, even in complicated problems like this one!

  3. Creating a simple first model

    In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.

  4. Learning from the experts

    In this chapter, you will learn the tricks used by the competition winner, and implement them yourself using scikit-learn. Enjoy!

  1. 1

    Exploring the raw data

    Free

    In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.

  2. Creating a simple first model

    In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.

  3. Improving your model

    Here, you'll improve on your benchmark model using pipelines. Because the budget consists of both text and numeric data, you'll learn to how build pipielines that process multiple types of data. You'll also explore how the flexibility of the pipeline workflow makes testing different approaches efficient, even in complicated problems like this one!

  4. Learning from the experts

    In this chapter, you will learn the tricks used by the competition winner, and implement them yourself using scikit-learn. Enjoy!

What do other learners have to say?

Devon

“I've used other sites, but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyd's Banking Group

Louis

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

Ronbowers

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics @ USAA

Peter Bull
Peter Bull

Co-founder of DrivenData

Peter is a co-founder of DrivenData. He earned his master's in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences. His work lies at the intersection of statistics and computer science, and he wants to help bring powerful new modeling techniques to the organizations that need them most. He previously worked as a software engineer at Microsoft and earned a BA in philosophy from Yale University.

See More
Collaborators
  • Hugo Bowne-Anderson

    Hugo Bowne-Anderson

  • Yashas Roy

    Yashas Roy

  • Casey Fitzpatrick

    Casey Fitzpatrick

Icon Icon Icon professional info