How to ship data science projects quickly

Developing and deploying robust data science projects requires strong project management practices specific to data science. On a recent webinar, Brian Campbell discussed his best practices for managing data science projects effectively. Find out what they are here.

Dec 14, 2021 · 4 min read

There are a lot of components to a data science project which require collaboration for successful deployment. With these connected parts, an agile approach is necessary for the development and deployment processes. This approach allows organizations to learn fast and iterate quickly. In a recent DataCamp webinar, Brian Campbell, Engineering Manager, Internal Engineering at Lucid Software, discusses the best practices for successfully deploying data science projects.

Early Steps to Allow for Agile Development

Brian explains that two key steps can be taken early in a project to allow for parallel progress on both the development and deployment of a model: working with baseline models and creating a prototype. While these approaches complicate timelines, they lead to significantly faster and better results.

Work with Baseline Models

A helpful step to successfully shipping machine learning projects is creating a baseline model. A baseline model gives the same outputs as the final model. However, it is built before most of the development process and may be based on heuristics or random data. This baseline gives a target for future iterations to beat.

As the team develops new models, new results can be compared to the baseline and inform learning. The gap between the baseline and the desired states informs project requirements and helps teams identify the most valuable data and solutions for the problem.

Work with Prototypes

Prototypes are models that ingest the same inputs and provide the same format of outputs that the final model will. This model can be informed by the baseline model. Once the prototype is ready, those in charge of deployment can begin work integrating the model into its intended purpose. The prototype enables parallel progress on the model; data scientists improve the model while the implementation team works on how the model will be packaged to customers or within a business process.

Prototypes only work when the final model behaves the same as the prototype. Thus, it is necessary to understand the available data and intended output format before creating the prototype. This requires some knowledge about the project beyond the initial problem formulation, and strong collaboration with problem experts.

Real-World Use-Case of Baselines and Prototypes

Brian discusses how his team at Lucid used baselines and prototypes to accelerate the development of a clustering model for a sticky note product used by product, design, and engineering teams. These teams often run design thinking sessions where a facilitator needs to manually cluster sticky notes of similar types under a given category. The system's goal was to reduce the time between brainstorming and discussion by automatically clustering ideas into categories and removing duplicates.

The baseline model took in random ideas and generated random categories for them. Then, they created a prototype of this model to enable the implementation experts to begin working early in the development process. Over time, they improved the baseline model by leveraging natural language processing and machine learning to cluster similar sticky notes together.

Communication is important with these models

The first iteration of this project ran into some problems because there was a lack of consistent collaboration between the deployment team and the data scientists working on the model. This lack of teamwork led to model performance issues, inaccurate testing, and a poor product experience. They were not able to realize the synergies of developing the model in parallel.

When they relaunched the project and collaborated effectively, they solved these issues and set realistic expectations. Collaboration was made possible by having a prototype model for the deployment team to work parallel with the data science team developing the model.

To understand the lessons learned from the project, and how to maneuver complex data science projects effectively, make sure to tune in to the on-demand webinar.

Topics

Data Science

blog

3 types of collaborators on data science projects

Successfully shipping data science projects requires collaborating with various types of stakeholders. Who are these stakeholders? On a recent webinar, Brian Campbell unpacked 3 types of collaborators you can find in every data science project.

DataCamp Team

4 min

blog

How To Manage AI Projects Effectively

As AI continuously evolves, managing data skills is a key element if you want to succeed in the fast-changing AI landscape. This blog post compiles the best management practices for large-scale AI projects.

Travis Tang

10 min

blog

Four Ways Your Team Can Start Leveraging Data Science

Becoming a data-driven organization can significantly help you make more effective resource allocation decisions, but this requires nurturing and fostering a company-wide data culture. Here’s a rundown of various data science practices that you can quickly adopt to start extracting value from their data, illustrated with specific use cases.

DataCamp Team

7 min

blog

5 Places to Host Your Data Science Portfolio

Creating and showcasing a solid data science portfolio is an essential step in bringing what you learn into reality. Conceptualizing, developing, and publishing your work requires consistency and persistence, but your hard work can help you position yours better.

Hajar Khizou

10 min

podcast

Organizing Data Science Teams

What are best practices for organizing data science teams? Having data scientists distributed through companies or having a Centre of Excellence? What are the most important skills for data scientists?

podcast

Harnessing the Power of Collaboration

In this episode of DataFramed, we speak with Brian Campbell, Engineering Manager at Lucid Software about managing data science projects effectively and harnessing the power of collaboration.

See More See More