Webinar

The Path to Data Fluency

Slides

In a previous webinar series, we discussed at length how organizations can scale their data strategy, and their data science practice, through the lens of our proprietary IPTOP framework. The core idea presented in this webinar series was how organization-wide data fluency can be attained as a result of scaling five key pillars simultaneously—Infrastructure, People, Tools, Organization, and Processes. But this requires careful orchestration as organizations advance through different stages of data maturity.

In this webinar, DataCamp’s VP of Product Research, Ramnath Vaidyanathan, will break down the four stages of data maturity organizations will go through, from data reactive to data scaling, data progressive, and data fluent. He will also demystify the defining characteristics of each stage of data maturity in terms of Infrastructure, People, Tools, Organization, and Processes. Finally, he will uncover the steps organizations need to take to transition from one maturity stage to another.




Webinar Transcripts




The data science revolution

Before we begin, it's important to understand what the data science revolution is all about. And this is to anchor today's presentation. There are two aspects to the data science revolution. On one hand, it has made the impossible possible and this is what you usually see in the popular press, self-driving cars, protein folding, Go-playing AI agents. This is what you typically see, but there is also a hidden revolution of data science that is much less talked about, and this is all about making the possibility widespread. So, in other words, expanding the number of people in an organization who can do data science, who can do forecasting, create dashboards, explore, make data-driven decisions. And today's webinar, we're going to be focused on how to get to Data Fluency, which is all about this hidden revolution in data science.



path_data_fluency_1



The path to data fluency depends on many levers


The path to data fluency actually depends on many different levers. We actually developed a framework to develop and scale data fluency and it’s called IPTOP. So the first level is Infrastructure. And this is all about how you collect, discover, understand, and make data actionable throughout the organization. The next fundamental lever is People, And so this is all about ensuring that you have the right people, the right skills, and learning resources. These two are the fundamental levers that enable data fluency. On top of it, there are three other supporting levers, tools, organization, and processes. Tools are all about frameworks and tools that you use on top of your infrastructure to make things efficient. Organization is all about how you organize your people to enable scalable decision-making. And finally, Processes are all about, how you do what you do,  and set things up so that you can scale collaboration, and make things efficient. 



path_data_fluency_2



The data maturity of organizations


The focus of today's webinar is going to be on data maturity. This is a really important concept for organizations. And the way we see data maturity data, can you see it as a spectrum. So on the very left, you have data reactive organizations. And then, typically, these are organizations where there is very little use of data. There is not much data-driven decision-making. And probably, very few companies, these days, are data reactive. And then, on the very right, you have data fluent companies. These are companies where everyone knows how to access the data, and data-driven decision-making is a part and parcel of the company's culture. In between these two endpoints, you have data scaling, where some reporting is done and there is some use of data, and data progressive, where you have more people who can do data science work and can analyze, report, and present within each team. This webinar is going to be about how to get your organization from data reactive to data fluent, by taking advantage of the IPTOP framework.




path_data_fluency_3



Today’s agenda



So to kick off today’s webinar, we’re going to be discussing how to go from each maturity stage to the other. So I've talked about the definitions of data reactive, data scaling, data progressive, and data fluent. Just to recap, data reactive means that there’s very, very little use of data within the organization, in fact, pretty much non-existent. Data scaling is some use of data but in very isolated cases. Data progressive, means there are enough pockets of excellence within the company, where data work is there. Data fluent, of course, is the most expansive use of data, where everyone works on data.

But before a poll, how would you describe your organization? As Data reactive, data scaling, data progressive, or data fluent? So as we can see, 6% of the audience is data fluent, 33% is data progressive, 39% is data scaling, and 22% is data reactive. Now that we’ve covered our basis, let’s make our journey through the data maturity spectrum. 

The path from data reactive to data scaling




What data reactive organizations look like




Let’s look at what a data reactive organization would look like, in terms of the IPTOP framework. So, these organizations' infrastructure is characterized by ad-hoc data collection, and on the people side, there are limited to non-existent data skills. Since infrastructure and people are the two fundamental pillars, if they are not in place, it doesn't even make sense to talk about tools, organization, and processes. There would be very limited use of these supporting levers. Let us expand that a little bit deeper to kind of really understand how this is going to look like practice.



path_data_fluency_4



In the infrastructure side, these are some of the pointers:

  • Data is sitting in spreadsheets

  • Data is very highly siloed. So, I mean, you kind of shared it with a few people. Nobody has an idea about all the data that is collected.

  • There has been no pointed investment into data infrastructure, so pretty much everybody uses tools that they think are going to be of value to them.


On the people front, it’s is all about skills

  • People trust instincts over data

  • There is limited, or pretty much no data talent

  • There is no strategy in place to upskill, and get people to become more data savvy

So, how do you get to data scaling?



The path to data scaling

The first thing to kind of look at is focusing on data culture. I think this is one of the most important elements of moving an organization from data reactive to data scaling. There are four elements here that you can use:

  1. Prove the value of data with a proof of concept

  2. Build strong executive support for data

  3. Put learning at the center of your data

  4. Invest in infrastructure talent


path_data_fluency_5



Let us dive deeper into two of these four elements.

Prove the value of data with a proof of concept

So the first thing to look at is proving value. One of the easiest ways to prove the value of data is to look at an analytics project. These are typically the low-hanging fruit. So here are a couple of examples. 


path_data_fluency_6


So one, pretty much every organization that has customers is going to look at churn at some point. An exciting and impactful analysis to prove the value of a concept would be to build a simple customer churn model and try to use that to reduce it.

Similarly, you could also kind of just do a simple proof of concept of a metrics dashboard. Nothing complex, just exposing the common business metrics in a very visible way that can help people make data-driven decisions. Another example would be A/B testing, for example for email subject lines in marketing campaigns. I think the key here is that you need to pick a project that is limited in scope, but, at the same time, can have an impact and showcase to the entire organization. These low-hanging fruit will help you galvanize executive support and get excitement around building data maturity. 

Build a data infrastructure strategy


The second is all about developing a data infrastructure strategy. So this is making sure that you collect all the data that is required. This is actually something that is really critical and a lot of organizations don't really think about data collection in a clean way, including data architecture. We'll talk more about data architecture, as we get to the remaining at the end of the spectrum. But it's good to start thinking about architecture at this point, as well. So, to summarize, the way to transition from data reactive to scaling is focusing on infrastructure and people.


path_data_fluency_7


On the infrastructure side, the idea is to develop a data infrastructure strategy. On the people's side, it's all about proof of concepts that prove the value that enables you to garner strong executive sponsorship. Moreover, it’s about creating a data strategy that puts learning at the center of the data culture. Finally, developing data infrastructure talent, so that you're able to supplement the infrastructure that you will build.



The path from data scaling to data progressive


What data scaling organizations look like


So, this should take you from data reacting to data scaling. So now, let us look at the next step, in this whole continuum. How do we get some data scaling to data progressive?  Let’s first start off by defining what a data scaling organization is in terms of the IPTOP framework.

On infrastructure, you have a few experts who understand how data is accessed, but there is no organization-wide access or a single source of truth. On the people's side, there is some data culture. You have some people who have the skills, but again, it's a really minimal aspect of the organization. On the tooling side, mostly it’s legacy tools being utilized.  From an organizational perspective, there is no centralized data team in place, nor data scientists embedded in teams. Finally, very limited processes are set up to take advantage of data science at scale. Let’s break this down into its components and parts.


path_data_fluency_8

Infrastructure:

  • There's no centralized storage and no source of ground truth

  • Datasets are highly siloed across different teams

  • Some departments lead the way in terms of doing data storage and collection, but it's not something that's pervasive across departments.

People

  • There is no defined data culture and no one has a shared data language

  • There are very few data believers which are excited about becoming a data-driven organization

  • There is very limited self-motivated learning. This learning is often limited to the data believers.

Tools 

  • It's all about spreadsheets with minimal use of modern data tooling

  • There is mostly use ad-hoc tooling

Organization

  • There is no defined organizational model, as in no data team structure in place

  • Change management is taking place in this organization, so how to inject data talent is unclear

Processes

  • There are no organization-wide data norms despite some departments creating processes for themselves

In summary, this is how you would recognize a data scaling organization. So, the question now is, how do we get from data scaling to data progressive, what are the next step in the path?


The path to data progressive


So, the first thing that I would recommend that we focus on is to reinforce infrastructure and tooling given that it's usually the backbone of any solid data work. So, the first part of reinforcing infrastructure and tooling is about centralizing your data storage. This is something that's really key. It does a couple of things, by having all your data in one place. I think it makes it a lot easier to have a single source of truth and ensure that everybody has access to this one source of truth rather than things being siloed and distributed across the organization. There are a bunch of tools available here from Microsoft Azure, Google Cloud, AWS, Amazon Snowflake, and more. There are a lot of tools out there that can enable centralizing of data storage.


path_data_fluency_9


The second element is to ensure data quality and access. So, this is all about ensuring that your data meets quality standards and providing access to tooling that would let users access the data. So, centralized data storage and data quality are the two fundamental elements that one should focus on the infrastructure side to move to data progressing.


Centralized data storage a DataCamp


path_data_fluency_10

Let me give you a quick example of how this simplified view of how this looks like at DataCamp. For example, we collect our data in three different ways. So we have data collected from our applications, usually a Postgres database or Snowplow, for clicks and usage across the DataCamp platform. We also have external databases from providers such as Salesforce and Marketo. These are external databases that collect data. Finally, we also make use of Google Sheets and AirTable. A lot of product management, feature tracking happens in these sheets. We use Fivetran to get the data into our data lake. Then we use Airflow to finally move everything over to Amazon Redshift, which is our centralized data storage.

Now, there are lots of choices to make here. This is just one way of doing it, and you can see that there are many, many vendors that provide services, and you need to evaluate carefully, depending on your needs.


Strengthen People


So now that you've talked about infrastructure requirements to move to data progressive, let us look at the other side of the coin, which is people, right. So how do you strengthen the people dimension to get to data progressive? 


path_data_fluency_11



The first and most important thing to do here is defining data culture. It's really important to define a solid data culture. This usually means three things.

  1. Data-driven project management. Defining what that means in your organization and enabling it.

  2. Incentivizing new behaviors, so incentivize people to make use of data, talk data and, essentially, anchor everything around data.

  3. Driving continuous education across the organization, so that everybody is constantly looking for ways to use data in improving their own work.

The second component is developing an upskilling strategy. To develop an upskilling strategy, there are three important things to consider:

  1. Understanding your personas, so not everybody in an organization is created equally in terms of how they would use and access data. So it's really important to understand the personas. There are people who are going to be doing a lot of data science work, some people are going to be doing machine learning work, some people are going to be doing business analysts work. So it is important to kind of keep that in mind while you develop your upskilling strategy.

  2. Assess skills regularly. It's very important and sometimes it is often not looked at in-depth. 

  3. Align learning to business strategy. So the worst thing to happen is to kind of build skills where there's no alignment with your strategy. So a good example would be, you get people to learn machine learning, but, you really don't have the setup internally to actually build and deploy machine learning models. So in this case, the business strategy is misaligned with the learning strategy.

Finally, it's really important to reward your change agents. As leaders, it’s very important to signal to your organization that you are committed to creating a data-driven organization. One way to do this is by rewarding those who are being proactive in becoming more data-driven, who are excited about creating a data-driven culture, and show progress there. This can be done by:

  1. Increasing visibility of their work

  2. Promoting and empowering people who are promoting the data culture

  3. Evangelizing good work throughout the organization to other teams


So, for example, at DataCamp we have data scientists that when they work on a project or publish a knowledge repository article, it then gets circulated across the organization. People comment on it, people support it. So there is a good level of evangelization, but it happens in a way so that the data work does not happen in isolation.


Organizational model


So now that we've talked about people, it's important to move to the next dimension, the next pillar, which is the organizational model. So how do you organize your data personnel so that you're able to deliver on the path to becoming progressive?  By far, there are two very popular models: 

  • Centralized model: where you have a centralized data team. It functions as a support center. You have questions coming in from product, finance, marketing engineering. The centralized data team handles these questions and gets responses back to these teams. 

  • Decentralized model: where you have the data scientists or the data folks embedded within each team. So, for example, finance, product marketing, and engineering. The idea is that the embedded data scientist handles the question to their own needs. 



path_data_fluency_12

Now, each model has its own pros and cons. So the centralized model, the big advantage, is that you have all your data flow in one place, so it encourages it functioning as a center of excellence. There is good knowledge sharing and the manager of the data science team typically has a lot of domain knowledge and makes technology, technology stack choices. There is a lot of standardization at work as well in a centralized model. 

But as a disadvantage, you don't have the business units kind of tied into the data science strategy, because data science is a support function. There are issues that can happen where there is misalignment. The turnaround sometimes increases because there are competing requests from different teams. So there are all these kinds of drawbacks. 

Drawbacks can be flipped around in the decentralized model where, I mean, you now have greater alignment, because the units have data scientists embedded. But it also makes it harder for the data scientists to collaborate and function as a center of excellence. Now, I wish I could kind of say, hey, one model is better than the other, but, as you could see here, that's not the case.

So it's very important that when you think about data progressive, evaluate these two models, and pick the one that is appropriate for your organization. Carefully evaluate the pros and cons, and decide which way you should go. But these are little broad models out there.

There is an excellent article that I would recommend reading from Harvard Business Review. It's on how to set up any AI center of excellence. Tom Davenport, who writes a lot on this topic, would be a good reference to sort of start thinking about the organizational models and setting up a center of excellence if that's something that you would be interested in doing.

So to quickly summarize, transitioning into data progressive, foundations are the key. We looked at infrastructure and how a centralized data storage solution, a single source of truth, is really important. Moreover, data governance and quality mechanisms are key to make sure that your data is being used correctly. Finally, data access needs to be structured, so that all teams are able to access data in a simple way.


path_data_fluency_13

On the people's side, it's important to promote and reinforce a data culture. It's important to reward change agents and data leaders, and also start thinking about organization-wide data upskilling initiatives so that you can strengthen the people's levers for data progressive.

On the tool’s side, it's about providing access to modern tooling. It's also important to align your tooling with infrastructure. This is actually something we will talk about more later. So if you have data scientists and business analysts, not everybody uses the same set of tools to access data. So it's important to ensure that you're very inclusive and you provide tooling that makes it easy for each group to access data.

Finally, on the organization and processes side, I think it's important to just define organizational models. Put basic things in place so that you're well-positioned to move into data progressive. Now that we have made the journey from data reactive to data scaling to progressive, look at what it's going to take to reach the final step. Many aren't going to call it a final step, because even within data fluency, there is a spectrum, but the final step in the data maturity spectrum.


The path from data progressive to data fluent


What data progressive looks like


So once again, let's just look at data progressive from the lens of IPTOP. So on the infrastructure side, like we saw, at the rate of progress, you have access to data, there is a single source of truth, and infrastructure is maturing, but data is not easily discoverable, compliant, actionable, or understood. 

On the people side, there is a strategic use of data, but still, it’s underutilized. There is some learning, but organization-wide data literacy is lacking. 

On the tool side, there is access to tooling, but limited data democratization. On the organization side, the data team is set in place, but it’s mostly limited to requests and analysis. Finally, on the process side, there are some data processes, but only for the high data competency teams. 

Delving one level deeper to look at how you identify a data progressive organization on the ground?


path_data_fluency_15


Infrastructure

  • There is a lack of data discovery which makes it difficult to discover data

  • There is immature data quality

  • Operationalization of data products is still nascent, as in it’s not easy to deploy data products across the board


People

  • Learning is limited to certain departments

  • There is no common data language across the board

  • The majority of data artifacts created are reports and that’s where the majority of value is coming from

Tools

  • There is some modern tooling that is accessible

  • There are no frameworks that democratize working with data and reduce the barriers to entry

Organization

  • The data team is siloed as a support function, so it’s mainly responding to requests from other teams

  • Democratization is stalled and the data team is not working on high-value projects that democratize access to the rest of the organization

Process

  • The data processes are not inclusive to a variety of personas that go beyond data experts

  • The data team is maturing, but there is still room for efficiency and scale when dealing with other stakeholders and teams


The path to data fluency


Infrastructure — Democratize Access


So this is how data progressive in the organization will look like on the ground. Now, let us now see how we can get from data progressive to data fluent. The first thing to do here is to democratize access to data. 


path_data_fluency_17


What does democratizing access mean? We've previously talked about a single source of truth by having your data in a warehouse. But that data is not useful if people cannot access it and people don't understand the context of the data collected and cannot find what they need. So this is why data discovery tools are really critical. Data discovery tools enable people to wade through this humongous ocean of data that you end up with when you create a single source. There are a lot of open-source tools available here. I mentioned Amundsen from Lyft, Databook from Uber, Collibra, and Data Portal, there are a number of tools available to enable data discovery.

The second element is data governance. It is really important to not just have discovery but also have data governance. Essentially, tracking who's using the data and how they are using the data. There are a bunch of data governance platforms out there. 

Finally, I think this is one of the reasons why a lot of data science projects fail, and that is because there is no clear path to operationalization. So maybe your data scientists build a machine learning model, but you don't have the resources to operationalize it or to deploy it. This can be about monitoring, refining, scaling, and improving quality standards. So, democratizing access is about data discovery tools, data governance, and the operationalization of data. 


path_data_fluency_18

Here are a couple of examples, again, from data discovery. On the left-hand side, you see Lyft's Amundsen which is open source. You can see that a user who logs into the portal can basically look at a dataset. Look at all the columns, the dataset, what data types they are, what description is available for each column. They can also see who owns this dataset, and also, the source code. So, pretty much it has all the details required for someone to take action on the data. On the right-hand side, you'll see a similar tool from Uber, Databook. There are a number of tools out there in the market, especially a lot of open source tools. I certainly recommend looking at one of these tools and seeing how it ties up to your stack.

On operationalization of data, here's a good example from Airbnb that recently IPOed. Prior to their IPO, one of the key things that they did was they basically had a full spectrum of data quality checks and data quality standards. So they implemented an end-to-end model that allowed them to go from raw data to metrics, but in a manner, where there was clear provenance, and those clear quality control at every stage.


path_data_fluency_19


So this is another example where data operationalization efforts can have a really high impact and, in fact, can be extremely critical to the success of an organization. If you're interested, there are two blog posts that we have linked to here. You get a much better sense of how operationalization was done in this context. Airbnb blog post Part 1 and Part 2

None of the elements of operationalization is about monitoring data products and production. This is usually something that companies organizations do when they are further along the spectrum and actually data fluent, but looking for ways to kind of move further up in the continuum. So, this has actually led to an emergence of new fields, like MLOps.Some of you might be familiar with this, MLOps is basically the intersection of machine learning, data engineering, and DevOps. It's typically all about making sure that I can continue to train a model on new data, monitor the quality of the models, and use it to score data real-time or batch.


path_data_fluency_20


So new fields are emerging, these are clear ways for organizations to kind of move further along the spectrum in data fluency. 


People — Strengthen organization-wide data skills


The same democratization of data access should also reflect on the skills side. This is one of the common themes that you may or may not have noticed that infrastructure and people are two pillars. You need to make movements on both fronts. That's something that sometimes organizations miss. Similarly to democratizing access, you also need to democratize skills. There are initiatives that companies typically carry out. Take the four initiatives listed here. For example, Mark and Spencers launched the data academy for their employees. AT&T announced a 10 year-long upskilling program valued at a billion dollars for the 140K employee base. Similarly, Amazon, Accenture launched massive upskilling programs on data literacy and data skills across the board. So upskilling and democratizing data skills is clearly seen as a path to competitive advantage.

At DataCamp we recognize the importance of upskilling, and we want to adopt a data-driven approach for that. One way to overcome obstacles of measuring skill progress is by looking at the skills of individuals and teams across the competencies required for a particular role and then trying to measure progress in a quantitative way. This is one of the reasons why we recently launched skill matrices as a way for organizations and individuals to track their skills repertoire and figure out what are the areas that they need to improve on using timed assessments.

If you haven't checked this out, definitely recommend checking out the skill matrices as a way to track and assess skill development.


path_data_fluency_21


Infrastructure and people are important. But when you get to the data fluent stage, you can get a lot more by tapping into the tools, organization, and processes levers.


Tools — Create frameworks



At DataCamp, we often create a lot of business metrics, and this typically involves writing a lot of code that would access the data, add it and combine it with the right set of tables, and finally calculate these metrics for different time periods, and then visualize them in a dashboard with custom code. Now, if you see something happening repeatedly, the immediate response is to kind of say, can we come up with a tool that would lower the barrier to entry?

path_data_fluency_22


That's exactly what we did and this is how frameworks democratize data. So we created a suite of R packages that essentially reduced the path to creating and visualizing a metric from a lot of lines of code to just a few lines of code, as you can see here. You start the dataset, you enrich it with information about the data, and then you compute the metrics across a set of dimensions instead of periods. With just a few lines of code, we're able to get from the raw data to a plot of the metric in question.

I'm sure you'll find many more examples out there where the idea is to build frameworks that simplify repetitive impactful tasks. Now, we tend to open source most of the packages, at least on this front. So, if you're interested and want to learn more about open source there is a link here. Please visit the other frameworks that we've open-sourced as well.


Organization — Create a hybrid organizational model


So now, let's move on to the organization model. Earlier, we talked about centralized decentralization. Now, how does the organizational model look like in a data fluent organization? What we have found, typically, is that, like, as organizations tend to mature, they eventually end up in a hybrid model.


path_data_fluency_21

The hybrid model basically means that there is a centralized data science team, but there are also data scientists embedded within each other's department business unit. This allows, in addition to circumventing the cons associated with the two models. So for example, with this model, the data scientist embedded can always ensure that things are the line, but also have a direct line relationship with the central team. Central team could focus on building tools that make the life of the embedded data scientists a lot easier. So the hybrid model works really well. Of course, the balance of how big should your central data science team be? What should be the percentage of people embedded within squads? This is something that you will have to decide as an organization looking at what is important in your specific case.


Scale your data processes 


There are many different elements of data processes and how to scale them. This is one example from Netflix, they realize they have different data personas, like in any data company, and they settled on Jupyter Notebooks as a common platform for everyone to do the data work.


path_data_fluency_22


What this allowed them to do was to build infrastructure that would take a notebook and create output like dashboards, ad-hoc analyzes, and everything. By settling in on Notebooks as a central team, they were able to kind of scale this process in a very nice way. 

This could not just be Notebooks, it could be spreadsheets and spreadsheet templates or tableau templates. The goal is to do some standardization of processes so that you're able to scale whatever process you come up with.


path_data_fluency_23

The blueprint for data fluency

To summarize, we looked at how to transition to data fluency. On the infrastructure side, it's about investing in data discoverability and democratization, strengthening your quality initiatives, building trust, and moving from experimentation to operationalization. On the people's side, it's about rolling out organization-wide upskilling efforts, assessing, tracking, and rewarding skill development. 

On the tool’s front, it is about developing frameworks to democratize data and lower the barrier to entry. On the organization front, it's about creating and developing hybrid models to drive data strategy. Finally, on the process front, it's all about developing scalable processes that you can implement throughout the organization to centralize insights and promote collaboration.

What you've seen in this webinar so far is how do you go from one level of data maturity, data reactive, all the way up to date fluent, but doing that in a highly and accessible way. Now, of course, there is a lot of information out there and a lot to absorb. So the whole goal is we kind of do this in a systematic way. That's exactly the blueprint that we are presented here, how to get from data reactive to data fluent.


path_data_fluency_24


Questions and answers


  • Question: Do you think that in transitioning from data reactive to data scaling if the people have minimal data skills, how would they go about creating a proof of concept?

  • Answer: That's a great, good question. I think the goal here is to pick a project that is closest to the skill set that you have. All the skills that you can learn. So a good example would be the A/B testing. Most A/B tests, the data collection, is fairly straightforward. I mean, you just have to get data on each of the scenarios, and the analysis can be done even in spreadsheets. So I think the way to go about here is to kind of minimize the overhead of any new learning and focus instead on impact less so on, on the actual, the methodology and the tooling.


  • Question: How do you align when it comes to aligning learning to business strategy? Do you have any tips for facilitating conversations between data professionals in the C-Suite on this topic? Do you think asking the C-Suite to think about data strategy strategy in terms of data tools, and organization, and asking data professionals to incorporate strategic thinking into their day-to-day, is a good way to go.

  • Answer: I think, definitely, I think this is mainly a question of alignment, here. If I understand the question correctly, I think one of the biggest reasons is data initiatives fail sometimes is that, let's say, as a data team, you spend all your time building dashboards. But let us say nobody is looking at those dashboards. I mean, in a proactive manner, right, there is a clear failure where you're not really aligned. So, I think the way to go about it is to, kind of start from the C-suite. Try to have a dialog where you understand, what is the best way for the C-suite to congeal data? So, like, do they want a dashboard? Do they want snippets of reports? on a weekly basis that outlines some new insights? Do they think that there should be, like, application of machine learning models that should automate things? So I think it's really important to have those dialogs. The key here is alignment. It's important that the alignment we set up, and I think the best way to do that, is through a dialog. Starting with the end in mind, how is data going to be used. At the end of the day, right, to drive profits, and that kind of work backwards from that.


  • Question:If an organization is still in the scaling or progressive stage, do you think it's too early to start using data discovery tools?

  • Answer: I think, I think the benefits of the data discovery tool, typically can, once you kind of have a larger fraction of people using the data. So, I would say that it's a little early, but I mean, you can accommodate it as a part of your data strategy, why not, Right? Just to give you an example we built our own thing here and mainly, because we just wanted a very basic setup, we really didn't want something that has an entire feature set available. So, if you're starting out, maybe all you need is just a Google Sheet, write a Google Sheet that has the datasets, a table that has the data, database table names, and the column names, and nothing, nothing fancy. I would say that I've got data, definitely, if you're scaling stage, I wouldn't really go for these tools. Then data progressive, depending on bandwidth. But, I think, I think even if you just have a single source of truth tables, I think that's good enough.


  • Question: What might be a good approach or solution to work with data in your organization when there is no centralized data storage, but two platforms, instead, Redshift in Azure, where does this fall into? Is this still being stuck in data progression?

  • Answer: I don't think this is probably going to be a case where a big organization like you have. I mean, you have parts of the organization working with Redshift and Azure. I think this is not getting stuck in data progression, to be honest, because, I mean, a single source of truth doesn't literally mean a single source of truth. Right? It basically means like for every piece of data, there is only one source.  I'm assuming that in this case, if Redshift and Azure have different pieces of data that I think it is still important. But having said that, I think one complexity it does introduce is that if you have technical complexity infuses, if you have data in different platforms, then, running queries that combine, the two can be more complex. So as long as that's not a requirement, I think, this would still fall in the category of, like, data progressive, then moving towards influencing. So I don't see this as a bottleneck to kind of moving to data fluency.


  • Question: Do you have a template on how to evaluate skill levels of employees, from beginner to intermediate to expert, because these tend to be very qualitative?

  • Answer: I will have a better answer for those in a few months. In fact, for some of you reading it and pulling some data or one of the things that you're working internally is to sort of quantify these skills in a cleaner way, across roles, and across proficiencies. So in the coming months, you should be able to see more information around it. We don't have a template offhand to do this. Although I do believe in it and you could answer this. I do believe that we have some white papers that talk about skills and some level of expertise you could get. I highly recommend checking out the L&D Guide to Data Fluency, which is also mentioned in the slide deck you will receive here. Now, I also highly encourage you to take a look at the signals skill assessment that we have, which is a standardized test by which you can evaluate the team skills, and you still matrices, as well as one of the features.


  • Question: When looking at personas in your organization, is there any kind of standard number and type of personas that you see across organizations? 

  • Answer: Actually, Yes, we've seen that there are actually eight personas that range from just data consumers who are people who just need to read data, to business analysts, data analysts, data scientists, statisticians, machine learning engineers, and data engineers. They have a range of use cases and different relationships that they have with data and data tools. I would highly encourage you to take a look at that white paper at the Atlantic Guide to Data Fluency.


  • Question: Who should be accountable for leading the effort to increase a firm's data fluency? Is this the collaboration between technology and human resources, or should it be completely led by technology?

  • Answer: It should be led  by the C-suite, because data efforts tend to take time to reap benefits. I would say that it should start with a very firm commitment from leadership. But in terms of driving the efforts on the ground, I think it should be a combination between technology and HR. I think technology is more from the point of view of identifying skills. identifying level of competency and setting up the framework.  I think HR primarily kind of looks at, OK, what incentives can be put in place for essentially, incentivizing people to kind of follow this path to attaining fluency? So I would say it should be a joint joint effort, but definitely leadership sponsored. Because otherwise, I mean, it's not, it's really not going to stick.


  • Question: Regarding leadership's role in data transformation, it's not just about how they consume and understand data, but how to sponsor and understand to make business decisions. Do you see any unique leadership skills in that journey that you think would be applicable?

  • Answer: it's just not about consuming an understanding. I see a couple of things. One, definitely, data, literacy, definitely understanding, like, what kind of what kind of decisions like data can lead to? I think the second thing is understanding the landscape of what is available.  So, for example, I think you need leaders who are able to understand one level deeper. So understand the difference between things like machine learning, deep learning and what it can do. So, you don't you don't want to kind of see deep learning as a panacea for all kinds of problems. When they call that, you need some basic statistics. I would say that a slightly deeper data literacy on part of leadership is kind of going to be really important, so that they're able to have meaningful dialogs and make meaningful decisions.



Ramnath Vaidyanathan

VP Product Research at DataCamp

Ramnath Vaidyanathan is the VP of Product Research at DataCamp, where he drives product innovation and data-driven development. He has 10+ years of experience doing statistical modeling, machine learning, optimization, retail analytics, and interactive visualizations.

Follow Ramnath on LinkedIn

Adel Nehme

Data Science Evangelist at DataCamp

Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.

Connect with Adel on LinkedIn and Twitter.

Hands-on learning experience

Companies using DataCamp achieve course completion rates 6X higher than traditional online course providers

Learn More

Upskill your teams in data science and analytics

Learn More

Join 2,000+ companies and 80% of the Fortune 1000 who use DataCamp to upskill their teams.

Don’t just take our word for it.

LearnPythonRSQLAssessmentsCareer TracksSkill TracksCourses
AboutWe're Hiring!About UsLearner StoriesCareersPressLeadershipContact Us
Download on the App StoreGet it on Google Play
Privacy PolicyCookie NoticeDo Not Sell My Personal InformationAccessibilitySecurityTerms of Use

© 2021 DataCamp, Inc. All Rights Reserved.