Skip to main content
HomeBlogData Science

How to ship data science projects quickly

Developing and deploying robust data science projects requires strong project management practices specific to data science. On a recent webinar, Brian Campbell discussed his best practices for managing data science projects effectively. Find out what the
Dec 2021  · 4 min read

There are a lot of components to a data science project which require collaboration for successful deployment. With these connected parts, an agile approach is necessary for the development and deployment processes. This approach allows organizations to learn fast and iterate quickly. In a recent DataCamp webinar, Brian Campbell, Engineering Manager, Internal Engineering at Lucid Software, discusses the best practices for successfully deploying data science projects.

Early Steps to Allow for Agile Development

Brian explains that two key steps can be taken early in a project to allow for parallel progress on both the development and deployment of a model: working with baseline models and creating a prototype. While these approaches complicate timelines, they lead to significantly faster and better results.

Work with Baseline Models

A helpful step to successfully shipping machine learning projects is creating a baseline model. A baseline model gives the same outputs as the final model. However, it is built before most of the development process and may be based on heuristics or random data. This baseline gives a target for future iterations to beat.

As the team develops new models, new results can be compared to the baseline and inform learning. The gap between the baseline and the desired states informs project requirements and helps teams identify the most valuable data and solutions for the problem.

Work with Prototypes

Prototypes are models that ingest the same inputs and provide the same format of outputs that the final model will. This model can be informed by the baseline model. Once the prototype is ready, those in charge of deployment can begin work integrating the model into its intended purpose. The prototype enables parallel progress on the model; data scientists improve the model while the implementation team works on how the model will be packaged to customers or within a business process.

Prototypes only work when the final model behaves the same as the prototype. Thus, it is necessary to understand the available data and intended output format before creating the prototype. This requires some knowledge about the project beyond the initial problem formulation, and strong collaboration with problem experts.

Real-World Use-Case of Baselines and Prototypes

Brian discusses how his team at Lucid used baselines and prototypes to accelerate the development of a clustering model for a sticky note product used by product, design, and engineering teams. These teams often run design thinking sessions where a facilitator needs to manually cluster sticky notes of similar types under a given category. The system's goal was to reduce the time between brainstorming and discussion by automatically clustering ideas into categories and removing duplicates.

The baseline model took in random ideas and generated random categories for them. Then, they created a prototype of this model to enable the implementation experts to begin working early in the development process. Over time, they improved the baseline model by leveraging natural language processing and machine learning to cluster similar sticky notes together.

Communication is important with these models

The first iteration of this project ran into some problems because there was a lack of consistent collaboration between the deployment team and the data scientists working on the model. This lack of teamwork led to model performance issues, inaccurate testing, and a poor product experience. They were not able to realize the synergies of developing the model in parallel.

When they relaunched the project and collaborated effectively, they solved these issues and set realistic expectations. Collaboration was made possible by having a prototype model for the deployment team to work parallel with the data science team developing the model.

To understand the lessons learned from the project, and how to maneuver complex data science projects effectively, make sure to tune in to the on-demand webinar.

Related

How to Choose The Right Data Science Bootcamp in 2023 (With Examples)

Learn everything about data science bootcamps, including a list of top programs to kickstart your career.
Abid Ali Awan's photo

Abid Ali Awan

10 min

DataCamp Portfolio Challenge: Win $500 Publishing Your Best Work

Win up to $500 by building a free data portfolio with DataCamp Portfolio.
DataCamp Team's photo

DataCamp Team

5 min

Building Diverse Data Teams with Tracy Daniels, Head of Insights and Analytics at Truist

Tracy and Richie discuss the best way to approach DE & I in data teams and the positive outcomes of implementing DEI correctly.
Richie Cotton's photo

Richie Cotton

49 min

Making Better Decisions using Data & AI with Cassie Kozyrkov, Google's First Chief Decision Scientist

Richie speaks to Google's first Chief Decision Scientist and CEO of Data Scientific, Cassie Kozyrkov, covering decision science, data and AI.
Richie Cotton's photo

Richie Cotton

68 min

Markdown Cheat Sheet

Learn everything you need to know about Markdown in this convenient cheat sheet!
Richie Cotton's photo

Richie Cotton

Chroma DB Tutorial: A Step-By-Step Guide

With Chroma DB, you can easily manage text documents, convert text to embeddings, and do similarity searches.
Abid Ali Awan's photo

Abid Ali Awan

10 min

See MoreSee More