How to Build a Winning Data Team
Organizations Require Winning Data Teams
Today’s organizations are generating more data than ever before. Forbes claimed over 2.5 quintillion bytes of data were generated every day in 2018 with over 90 percent of the data in the world being generated in the two years before the article was published. Individual countries are generating even more data. In 2018, CNBC reported that China generated 7.6 zettabytes of data. This is expected to scale to 48.6 ZB in 2025. The U.S. generates similar amounts of data as well.
There are a lot of actionable insights available in this data that are not taken advantage of. According to Forrester, 60 to 73 percent of an organizations’ data is not leveraged in analytics.
Successful, data-driven organizations are leveraging data at scale to generate value. For example, Uber invested heavily in the creation of a platform to efficiently deliver the over 100 petabytes of available data to their data teams through a simple interface which they scaled to deliver over 1 billion Uber Eats orders with over 24 million miles covered. Through A/B testing, Netflix generates 20 to 30 percent more views by changing the picture associated with a movie or TV show.
There are clearly high-value insights available to organizations that can successfully navigate their complex and large data landscapes. This type of value generation can only be achieved through a high-performing and comprehensive data team.
Data Roles for a Winning Data Team
Many key roles go into building a successful data team. In this white paper, DataCamp describes eight roles, or personas, that can be found in any data-driven organization. While job titles may differ from one organization to another, we’ll outline five of the roles that make up a strong data team: business analyst, data analyst, data scientists, machine learning scientists, and data engineers.
Business Analysts increase profitability and efficiency from data insights. They supplement their deep knowledge of the business domain with data analysis and visualization skills and report on insights to data consumers.
Key Skills: Data Manipulation, Data Visualization, Reporting, Basic Statistics Tools: Spreadsheets (Excel, Google Sheets), Business Intelligence tools (Tableau, PowerBI), SQL
Data Analysts play a similar role to business analysts in analyzing and drawing insights from data to drive business outcomes. Therefore, their skills also overlap; however, data analysts answer less defined problems that require a higher understanding of the data analysis workflow, and leverage a combination of coding and non-coding tools
Key Skills: Data Manipulation, Data Visualization, Reporting, Importing and Cleaning Data, Probability, and Statistics
Tools: R or Python, Spreadsheets (Excel, Google Sheets), Business Intelligence tools (Tableau, PowerBI), SQL
Data Scientists play a significantly more technical role in organizations, working mostly with coding tools to investigate, extract, and produce insights and value with data. Data scientists require a strong understanding of data analysis and machine learning workflows and the ability to work with non-standard data types and big data tools.
Key Skills: Data Manipulation, Data Visualization, Reporting, Importing and Cleaning Data, Probability and Statistics, Machine Learning,
Tools: R, Python, Scala, Big data tools (Airflow, Spark), SQL, Command-line tools (Git, Shell)
Machine Learning Scientist
Machine Learning Scientists are responsible for developing machine learning systems at scale. They derive predictions from data using machine learning models of all types to solve problems like predicting churn and customer lifetime value, and are responsible for deploying these models for the organization to use.
Key Skills: Data Manipulation, Data Visualization, Importing and Cleaning Data, Probability and Statistics, Machine Learning, Data Engineering Tools: R, Python, Scala, Big data tools (Airflow, Spark), SQL, Command-line tools (Git, Shell) Course Recommendations: Machine Learning Scientists Career Track (R - 14 courses/Python - 23 courses), Image Processing in Python, Machine Learning with PySpark
Data engineers are responsible for creating data pipelines that help organizations get the correct data to the right people. They combine large amounts of data from different sources into one centralized location, enabling the various data roles to work with clean, relevant, compliant, and actionable data.
Key Skills: Data Manipulation, Importing and Cleaning Data, Data Engineering, Advanced Programming Tools: Python, Scala, Big data tools (Airflow, Spark), SQL, Command-line tools (Git, Shell), Cloud Platforms (e.g., AWS) Course Recommendations: Data Engineer with Python Career Track (25 courses), Introduction to Airflow in Python, Streaming Data with AWS Kinesis and Lambda
How to Create a Winning Data Team
With a high-level understanding of the responsibilities of each role, let’s now examine how each of these roles may interact in a business setting to drive value by using a real-world example, where a data team extracts value from a customer churn model.
In this context, data engineers ensure that data scientists and machine learning scientists have access to the high quality data they need to develop and operationalize the model. They would ensure the quality of the data and the correct permissions for each dataset are enforced. They would also deliver the data in an easy to access way with necessary metadata and variables for an effective analysis.
Next, data scientists and machine learning scientists would work together to create an accurate model to predict customer churn. They would need to ensure the model is accurate, interpretable, and can be deployed within a business process, and work to ensure the model remains accurate on unseen data once it is deployed.
Finally, data analysts and business analysts would work together to leverage the outputs of the model to make decisions that drive business value. They can allocate marketing spend based on which customers are more likely to churn, and provide fact-based reasoning to why a certain segment of customers is more likely to churn to decision makers.
Like the majority of data projects, each role plays a part in extracting value from data.
The path towards an effective data team starts with your people
Data positions are in high demand despite the recent hiring decreases caused by the pandemic, ranking twice in the top three of LinkedIn’s annual emerging jobs report. At the end of 2020, Deloitte claimed that 23% of organizations had a major or extreme gap between AI needs and current abilities. In 2021, Deloitte published a report arguing that to face this shortage in data position hiring, organizations must practice selective hiring and targeted upskilling.
Given the data talent shortage, organizations must combine selective hiring with upskilling to create highly skilled data teams that can create value. Organizations can do this by providing personalized learning paths for their people specifically tailored for these roles. At DataCamp, we provide learners personalized learning journeys based on their skill set and desired learning outcomes, enabling them with the tools to assess, learn, practice, and apply new data skills. With DataCamp for Business, learners can benefit from custom learning tracks that enable organizations to tailor learning programs to their specific goals and challenges. Learn more on how you can transform your talent with DataCamp.
DataCamp for Business provides an interactive learning platform for companies that need to upskill and reskill their people on data skills. With topics ranging from data literacy, and data science to data engineering and machine learning, over 1,600 companies trust DataCamp for Business to upskill their talent.