Skip to main content
HomeBlogWorkspace

Building Your Data Science Portfolio with DataCamp Workspace (Part 3): Add Machine Learning Workspace

Learn how to leverage DataCamp Workspace to produce a machine-learning project to add to your data science portfolio. We cover how to get started, how to structure your work, and common mistakes to avoid.
May 2023  · 8 min read

This is the third and final article in our series on how to use DataCamp Workspace to create a data science portfolio. In Part 1, we covered the basics of a portfolio with Workspace. In Part 2, we dove into creating an analytics project. Finally, in this article, we will develop a machine learning project.

Why? A machine learning project can be a great way to show off your ability to process data, select and fit appropriate models, and demonstrate your ability to solve practical problems.

What is a Machine Learning Project and How Do I Get Started?

A portfolio machine learning project demonstrates you can build an end-to-end machine learning solution. This covers conceptualizing the problem, evaluating a model’s performance, and interpreting the results.

The first decision you must make is what type of project you will create. This should be based on several factors, including:

What types of skills will I be expected to perform?

If you are applying for a marketing position, perhaps demonstrating knowledge of segmentation techniques is a good choice. On the other hand, if you aim for a position in the financial sector, predicting a company's or portfolio's future earnings might be more appropriate. Try to anticipate the types of tasks your desired role would perform regularly.

What kind of data will I be working with?

You should also tailor the data to the types of roles you are applying to. For example, don’t use iris flower data to demonstrate your classification skills when applying for a sales analyst role. Try to get your hands on relevant data!

If you want inspiration, be sure to check out this article on machine learning projects for all levels. You can also check out our curated datasets on Workspace which include prompts that can get you started!

Key Sections of a Machine Learning Project

As with an analytics project, sticking to a structure makes your work easier to follow while maintaining focus.

While the structure of a machine learning project for a portfolio might not be identical to the one you would use in production, we’ve outlined a good format for a portfolio project in Workspace below. See our Developing Machine Learning Models for Production course for more on machine learning models in production.

Here is where you motivate the purpose of your project. What problem are you trying to solve? How might stakeholders be able to use your results? Readers should come away from this section with a clear understanding of how your project provides value.

The data you select for a machine learning project is rarely ready for modeling. Therefore, you must be transparent about how you acquired your data and the steps you took to clean it.

It is important to perform a thorough exploratory analysis of your data prior to modeling. This includes (but is not limited to):

  • Performing data validation
  • Reviewing missing data
  • Visualizing distributions and frequencies
  • Identifying outliers
  • Analyzing relationships in the data

A strong exploratory analysis will inform decisions such as pre-processing steps and the models that you select. Be sure to check out Exploratory Data Analysis in Python or Exploratory Data Analysis in R for a thorough review of exploratory analysis techniques! Alternatively, you can also check out this Python tutorial on the topic.

Note: Workspace Chart cells can be a handy way to create clean and interactive visualizations of your data and save you precious coding time.

Horizontal bar chart created in seconds from a DataFrame

4. Feature Engineering and Pre-Processing

Data rarely comes in a format that is immediately ready for a machine learning model. Feature engineering and pre-processing your data are essential to make your model run faster, prevent overfitting, and improve overall performance.

Things you might include here (but are not limited to) are:

  • Data transformations, such as scaling your numeric data
  • Encoding categorical variables (see our Workspace template here)
  • Imputing missing data
  • Dimensionality reduction
  • Creating new features

Get caught up on essential techniques by taking our Feature Engineering for Machine Learning in Python or Feature Engineering in R!

This section should house all your modeling steps, including fitting a model, making predictions, and evaluating the performance. It also includes additional stages, such as tuning your model and comparing different models.

As with any data science topic, DataCamp has a wealth of resources to get you caught up on all the techniques you will need:

Of course, this is only a small selection of the machine learning content available on DataCamp. For a more comprehensive overview, be sure to check out our full catalog of machine learning and AI courses. Alternatively, you can also check out our Workspace templates that have code ready for regression and classification workflows.

This is where you translate your results into action. What did you accomplish with your project? Are you able to accurately predict who will churn from your subscription service? What is your margin of error when predicting house prices?

You should also think about future steps for the project. What could you do to build upon the work? This is your time to show that you are an excellent coder and can help others extract value from your work.

Tip: Take full advantage of Workspace text cells to ensure your work is nicely formatted and contains appropriate headers. Not only does this make your work look more professional, but it will also auto-generate a table of contents that users can use to navigate the project.

To quickly get started with this structure, you can use this Python or R workspace template!

An extract of a table of contents generated from workspace headers!

Common Mistakes to Avoid

Getting too technical

Ultimately your work should be solving a problem. You may be incredibly gifted on a technical level, but it is wasted talent if others can’t extract value from your work.

Make sure that you can translate the technical outcomes for a non-technical audience. For instance, what does the error of your predictive model mean when it goes into practice? How do the groups in your customer segmentation differ, and how can the marketing team leverage this analysis?

A helpful trick to reduce your written content's complexity is using the AI Generate feature in Workspace text cells to provide simple explanations of technical terms.

Getting inspiration for explanations

Spreading yourself too thin

It can be tempting to use the project to show off every modeling technique you know. However, you often risk overwhelming the reader while also demonstrating a lack of editing skills. It is great if you tried multiple techniques and compared the results, but oftentimes these are best saved for an appendix.

Instead, try to focus on one or two modeling techniques, analyze and evaluate them thoroughly, and shift additional work to the appendix with a reference. This is far more digestible for readers and also shows a commitment to quality over quantity.

Next Steps

If you need to brush up on some machine learning techniques, our extensive course catalog on the topic should get you caught up quickly. Otherwise, we recommend you already jump into Workspace and get coding! As mentioned earlier in the article, you can create a workspace directly from a dataset.

Alternatively, you can also create an empty workspace from scratch. Get started now in either Python or R!

Build your data portfolio

Showcase your skills and projects in minutes.

Build Your Portfolio
Topics
Related

Run Data Hackathons with DataCamp Workspace

With DataCamp Workspace, running data hackathons becomes easy and fun. Explore how Workspace solves common pitfalls and the steps to organize your own hackathon.
Filip Schouwenaars's photo

Filip Schouwenaars

9 min

How To Use Workspace AI-Powered Notebooks for Every Data Skill Level

Find out how DataCamp Workspace and its AI Assistant can boost your data science workflow - regardless of your skill level.
Alena Guzharina's photo

Alena Guzharina

6 min

Schedule Data Notebooks to Automate Business Metric Reporting

Leverage the scheduling capabilities of DataCamp Workspace to automatically report on business metrics with insightful visualizations
Filip Schouwenaars's photo

Filip Schouwenaars

5 min

Top 32 AWS Interview Questions and Answers For 2024

A complete guide to exploring the basic, intermediate, and advanced AWS interview questions, along with questions based on real-world situations. It covers all the areas, ensuring a well-rounded preparation strategy.
Zoumana Keita 's photo

Zoumana Keita

15 min

Avoiding Burnout for Data Professionals with Jen Fisher, Human Sustainability Leader at Deloitte

Jen and Adel cover Jen’s own personal experience with burnout, the role of a Chief Wellbeing Officer, the impact of work on our overall well-being, the patterns that lead to burnout, the future of human sustainability in the workplace and much more.
Adel Nehme's photo

Adel Nehme

44 min

Becoming Remarkable with Guy Kawasaki, Author and Chief Evangelist at Canva

Richie and Guy explore the concept of being remarkable, growth, grit and grace, the importance of experiential learning, imposter syndrome, finding your passion, how to network and find remarkable people, measuring success through benevolent impact and much more. 
Richie Cotton's photo

Richie Cotton

55 min

See MoreSee More