Skip to main content
HomeBlogData Science

How to ship data science projects quickly

Developing and deploying robust data science projects requires strong project management practices specific to data science. On a recent webinar, Brian Campbell discussed his best practices for managing data science projects effectively. Find out what the
Dec 2021  · 4 min read

There are a lot of components to a data science project which require collaboration for successful deployment. With these connected parts, an agile approach is necessary for the development and deployment processes. This approach allows organizations to learn fast and iterate quickly. In a recent DataCamp webinar, Brian Campbell, Engineering Manager, Internal Engineering at Lucid Software, discusses the best practices for successfully deploying data science projects.

Early Steps to Allow for Agile Development

Brian explains that two key steps can be taken early in a project to allow for parallel progress on both the development and deployment of a model: working with baseline models and creating a prototype. While these approaches complicate timelines, they lead to significantly faster and better results.

Work with Baseline Models

A helpful step to successfully shipping machine learning projects is creating a baseline model. A baseline model gives the same outputs as the final model. However, it is built before most of the development process and may be based on heuristics or random data. This baseline gives a target for future iterations to beat.

As the team develops new models, new results can be compared to the baseline and inform learning. The gap between the baseline and the desired states informs project requirements and helps teams identify the most valuable data and solutions for the problem.

Work with Prototypes

Prototypes are models that ingest the same inputs and provide the same format of outputs that the final model will. This model can be informed by the baseline model. Once the prototype is ready, those in charge of deployment can begin work integrating the model into its intended purpose. The prototype enables parallel progress on the model; data scientists improve the model while the implementation team works on how the model will be packaged to customers or within a business process.

Prototypes only work when the final model behaves the same as the prototype. Thus, it is necessary to understand the available data and intended output format before creating the prototype. This requires some knowledge about the project beyond the initial problem formulation, and strong collaboration with problem experts.

Real-World Use-Case of Baselines and Prototypes

Brian discusses how his team at Lucid used baselines and prototypes to accelerate the development of a clustering model for a sticky note product used by product, design, and engineering teams. These teams often run design thinking sessions where a facilitator needs to manually cluster sticky notes of similar types under a given category. The system's goal was to reduce the time between brainstorming and discussion by automatically clustering ideas into categories and removing duplicates.

The baseline model took in random ideas and generated random categories for them. Then, they created a prototype of this model to enable the implementation experts to begin working early in the development process. Over time, they improved the baseline model by leveraging natural language processing and machine learning to cluster similar sticky notes together.

Communication is important with these models

The first iteration of this project ran into some problems because there was a lack of consistent collaboration between the deployment team and the data scientists working on the model. This lack of teamwork led to model performance issues, inaccurate testing, and a poor product experience. They were not able to realize the synergies of developing the model in parallel.

When they relaunched the project and collaborated effectively, they solved these issues and set realistic expectations. Collaboration was made possible by having a prototype model for the deployment team to work parallel with the data science team developing the model.

To understand the lessons learned from the project, and how to maneuver complex data science projects effectively, make sure to tune in to the on-demand webinar.


Building Your Data Science Portfolio with DataCamp Workspace (Part 1)

Learn how to build a comprehensive data science portfolio by exploring examples different examples, mastering tips to make your work stand out, and utilizing the DataCamp Workspace effectively to showcase your results.
Justin Saddlemyer's photo

Justin Saddlemyer

9 min

Google Bard for Data Science Projects

Learn how to leverage Google Bard for project planning, data preprocessing, exploratory data analysis, feature engineering, model selection, hyperparameter tuning, model validation, and building and deployment of a web application.
Abid Ali Awan's photo

Abid Ali Awan

13 min

Building a Safer Internet with Data Science

Learn the key drivers of a data strategy that helps ensure online safety and consumer protection with Richard Davis, the Chief Data Officer at Ofcom, the UK’s government-approved regulatory and competition authority. 
Adel Nehme's photo

Adel Nehme

43 min

How Data Scientists Can Thrive in the FMCG Industry

Find out how data science drives strategy in the FMCG industry.
Adel Nehme's photo

Adel Nehme

42 min

Conda Cheat Sheet

In this cheat sheet, learn all about the basics of working with Conda. From managing and installing packages, to working with channels & environments, learn the fundamentals of the conda package management tool suite.
Richie Cotton's photo

Richie Cotton

Git Rebase Tutorial for Beginners

Discover what Git Rebase is and how to use it in your data science workflows.
Javier Canales Luna's photo

Javier Canales Luna

8 min

See MoreSee More