How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

Adel and Saurabh explore the importance of data quality and how ‘shifting left’ can improve data quality practices, operationalizing ‘shift left’ strategies through collaboration and data governance, future trends in data quality and governance, and more.

Updated Mar 2024

Guest

Saurabh Gupta

Saurabh is a seasoned technology executive and is currently Chief Strategy & Revenue Officer at The Modern Data Company, formerly leading the Data Strategy & Governance practice at Thoughtworks. With over 25 years of experience in tech, data and strategy, he has led many strategy and modernization initiatives across industries and disciplines. Through his career, he has worked with various Internation Organizations and NGOs, Public sector and Private sector organizations. Before joining Thoughtworks he was the CDO/Director for Washington DC Gov., where he developed the digital/data modernization strategy for education data. Prior to DCGov he played leadership and strategic roles at organizations including IMF and World Bank where he was responsible for their Data strategy and led the OpenData initiatives. He has also closely worked with African Development Bank, OECD, EuroStat, ECB, UN and FAO as a part of inter-organization working groups on data and development goals. As a part of the taskforce for international data cooperation under the G20 Data Gaps initiative, he chaired the technical working group on data standards and exchange. He also played an advisor role to the African Development Bank on their data democratization efforts under the Africa Information Highway. Saurabh has also been a party of the startup community and advises/mentors several startup/founders. People are the key to sustain any large impactful change and he spends a lot of time focusing on team development, collaboration and opportunities to ensure the change is more sustainable. He lives with his wife, teenage daughter and dog in the DC metropolitan area and love traveling as a family and spend time exploring.

Host

Adel Nehme

Key Quotes

For GenAI to work better, to deliver you better results, you need better data foundation. And I think that is adding to the importance of how data should be managed, how data should be governed, what are the ethics around it, how data quality should be seen more and more. I would say this is just starting. And a lot of tools have also started coming in, but I think more than the tools, the process part, the people part, the acceptance and understanding of the importance is the key

There are a couple of areas where the data governance role becomes very, very pivotal in terms of data literacy programs. One is having a standard around training, adoption is also key. There’s nothing like having a data governance person being a part of this whole discussion too. That said, the constituent parts of data literacy, like what policies are, what standards are, are very important, and that part of communication should be led by a data governance force. The importance of it, and why it is needed. So take for example, in a banking or a financial industry where regulatory compliance is pretty high, the standards which are put in place by the data governance side have to be communicated to a lot of stakeholders. They're not just the end users, but also teams which are managing processes, teams which are managing consumers, right? So in those organizations, literacy is at multiple levels. It's at executive level, it's at teams which are managing, processing, maintaining data, and the last level is people who are consuming data. So any sort of communication, training, change management to be planned should be done in close partnership with the data governance. And they will bring in aspects which are really, really specifically important for people at different levels in the organization.

Key Takeaways

Integrating data quality checks early in the data lifecycle can significantly improve the end consumer's experience and ensure the integrity of data throughout its journey.

Implementing robust data governance policies and practices is not just regulatory compliance but a strategic asset that enhances data quality and operational efficiency.

Encouraging open dialogue and collaboration between those who produce data and those who consume it leads to a more aligned understanding of data quality and governance needs.

Links From The Show

The Modern Data Company

Monte Carlo: The Annual State of Data Quality Survey

[Course] Data Governance Concepts

[Webinar] Crafting a Lean and Effective Data Governance Strategy

Transcript

Adel Nehme: Hello everyone, I'm Adel, Data Evangelist and Educator at DataCamp, and if you're new here, DataFramed is a weekly podcast in which we explore how individuals and organizations can succeed with data and AI. There's this concept in software engineering called Shifting Left, which refers to an approach to testing software happening way earlier in the development lifecycle.

This not only helps teams build better software development rituals and be more effective at delivering high quality software, it puts quality and usability of software as a key dimension to evaluate as you are building a piece of software. So taking that concept of shifting left, how do you apply it on data?

Enter Saurabh Gupta. Chief Strategy and Revenue Officer at the Modern Data Company and former Data Strategy and Governance Lead at ThoughtWorks. In this episode, Saurabh outlines how data teams and data leaders can start shifting left with data and embed data quality and usability as a key dimension to evaluate as data products are being built.

I really enjoyed this conversation because shifting left really clicked for me and I have a hunch that this data product mindset will start creeping into the mainstream over the months to come. If you enjoyed this episode in the series, make sure to let us know in the comments on social or more. And now on today's episode, if you enjoyed this episode, make sure to let us know in the comments on social or more.

And now on today's episode. Gupta, it's great to have you on ... See more

the show.

Saraubh Gupta: Thank you, Adele, for having me on. It's very nice.

Adel Nehme: Likewise, so you lead the data strategy and governance practice at ThoughtWorks, and I would like to set the stage for today's conversation given, you know, it's going to be quite a lot about data strategy, data governance, and, you know, the importance of the quality agenda. according to the Monte Carlo 2023 state of data quality survey, the time to resolve data quality incidents has increase in 2023 from 2022, and the percentage of company revenue reporting impacted by data quality issues has increased in 2023 over 2022, survey is not necessarily fully reflective of the state of data governance today, but gives us a good idea of where data quality efforts are at.

So maybe with that in mind, I'd like to begin by looking at how you think the state of governance Data governance and quality is as we head into this new year. And where are we today when it comes to the state of data quality organizations?

Saraubh Gupta: Adele, you touched a very, very sensitive topic or an important topic in the data industry, data quality. challenge with data quality is it's always pushed as far as possible to the consumer side. And as a data consumer, when I start looking at data and I realize it's not making sense is I start raising alerts.

And this has been a practice ever since because most of the time when data was being consumed, it was in form of reports or charts or tables. One of the very, very emergent trends nowadays and very useful directional shifters like move data quality and data checks as left as possible. And not only like it sort of simplifies and improves the consumer experience with data, but also like multiple tracks in the way data is being consumed.

It's no more just reports. People are using data in applications, different products they use, a lot of API based data exchange happening. So moving it, data quality to the left, I think is a very, very welcoming change that the industry is seeing. But that also means the data practitioners or data owners, data stewards have to start looking at data early on and say, okay, what does data quality mean when I'm looking at this data?

And here comes the role of data governance. And why I'm bringing data governance into quality discussion is like, I think more than the quality aspect, the standards around data quality. The behavioral aspect, the checks and balances, I think, have to be defined. The policy side of it is what data governance brings into play.

like, across the industry, we have been seeing this shift where data quality is becoming a cost prohibitive. Every time there's a problem, there's an issue with data. It circles back all the way to the source and not only is the cost in terms of fixing the data, redoing the reports, redoing the messaging, but also the opportunity cost that comes with it.

So more and more practitioners have started looking at how data governance can play a role with a specific focus on data quality. and everything doesn't need to be in one shot, like it's a behavioral change, right? So how do we start small, build incrementally in a way that everyone in the process from data engineers, data developers, data analysts, consumers, they start

Adel Nehme: hmm. That's great. And there's tons to unpack in what you said here. You know, you, you mentioned how data quality efforts and data quality and data governance has relatively been relegated in the past to the data consumer, right? Data consumer spots issues, they raise alerts. And by that time, it's already, a bit too late because the problem has already started, right?

And then you mentioned as well as an antidote for that is shifting left, which is something that is very, very interesting. It's this emerging concept as you mentioned, before we talk about what shifting left is, I want to talk about kind of the reason why data quality efforts have been a bit sidelined as Less strategic potentially over the past few years, right? I think there is a widespread recognition within the data community of how important data governance is. However, you know, often organizations view data governance efforts as overheads, right? maybe not a strategic program or project. unpack that perception for us and why it has been the case over the past few years.

Saraubh Gupta: mean, I see data quality in two levels. One is very, very specific saying, okay, this is black and white, whether this is right or wrong. occasions where data quality can be very much specific to a use case, how the data is being consumed. And I, and I can give you an example.

So take for example, pizza company, they want to report their quarterly sales of pizza. It has to be perfect. It has, it's imbalances have to be met, but. The Chief Operating Officer of this pizza company wants to track on a daily basis or a which particular outlet is selling more pizza versus which is having a slack and wants to understand more focus on where the slack is and why it is happening.

And that doesn't need to be a perfect data. It just, it could be, it may not need to meet the highest level of quality standard, but as far as the Chief Operating Officer Desk gets a report early in the morning saying these are the 40 joints out of the 400 you have are not meeting the usual sales they have.

I think that's a very, very useful data point if you see from the chief operating officer's point of view, that he or she is looking into, okay, where I need to take action or immediate action. So data quality becomes important, but not too important. the most. I think consumers drive the definition of what data quality and one of the reasons why there has been the data quality has always been neglected because the connect between the data producers and data consumer has been Missing.

And as we start talking about the new concepts like data products, data mesh, the ownership of data is being identified and the, as a parole, the data owner is also responsible, or a data product owner is responsible to bring the quality standards from a consumer point of view. He or she's supposed to bring, it's gonna be the glue or the condu between data consumers and data producers.

And I think more we are seeing this. The better the quality is going to get better. Other issue at all, what we have seen always, it's always been the case, like as the volumes of data keep increasing, the push from the consumer side is always give me data, give me data. And, a lot of times like a lot of organizations are not using the modern data tools, data platforms.

modern practices around data. So typically what happens is every time a new data set needs to be brought in, a new transformation needs to be done and someone is writing code. So the whole focus on left hand side becomes get the data, write the code, write the transformation and turn it into a data set or a report which can be consumed.

And I think that shift is needed and It's happening. Like, as the senior level executives have started using data as a rubric for their decision making, I think the quality aspects are coming, getting highlighted more and more.

Adel Nehme: Okay. There's a lot to unpack in what you mentioned here, but I want to go back in what you mentioned on shifting left and why that matters. So, we talked about a reason for why data governance efforts and data quality has not necessarily progressed quite a lot is the disconnect between data consumers and data producers.

that's a very fascinating insights that I'd love to focus on, but maybe first walk us through what shifting left means and why it matters in the context of data quality.

Saraubh Gupta: So, if you see, right, like I'll just draw a simple data life cycle picture for you. You start with systems that are producing data. These are transactional systems. It can be HR, financial, logistics, supply chain. From here, data goes into them. Data is extracted and brought into a data lake or a data warehouse.

And from there, People are using this data to build their own reports or their own outlooks. Now, most of the challenge around data quality gets highlighted when people start using the data. I'll give you an example that take for, there's a data being collected, like World Bank or IMF, they collect data from countries.

They're collecting GDP data, they're collecting population data. Population data is coming from one agency, GDP is coming from another. And an economist is calculating GDP per capita. So it's nothing but GDP divided by the population. Now, if you see, like the population data is intact, GDP data is intact, but the GDP per capita data, when they calculate, they realize, oh, something is wrong with that.

Now, that check that happened on the consumer side, on the economist side, who is looking into writing a policy report, is way too late. It had the same thing happened in the beginning when the data is being brought into the system. I think it would have been a big, big, big advantage. So that's what I mean by shifting left.

So from moving from the consumer side, start moving some of these data quality related efforts to the producer side. Sound very simple to do it, but what it also means is like on the data producer side, someone has to start understanding how this data is being used on the consumer side. So, I think for example, like, GDP data is such an important data to track whether the, how the countries are performing, right?

Now, as a check, the data producer side, I can put a check saying if this number is missing, give me an alert. Fine. In place of 100 billion, someone typed in 10 billion. So it passes through the checkpoint, but the number is down. GDP cannot move from 100 billion to 10 and then move to 120 the next year, right?

But this gets highlighted after all the transformations, 15 different places people are using it. It goes to, say, 10 different consumers. An economist, a publication, it goes to a newspaper. And that's when they find out, wow, this is wrong. Something is going wrong. all these 10 consumers start reporting it to this agency and then they go back, back, back.

I think it loses a lot of time. a lot of loss also, right? On the contrary, like, if this quality check was that whenever GDP increases more than 5%, there is something wrong. That check could have been done in the producer side. It wouldn't have happened. So I think, long story short, what is really needed is Bringing awareness around what data quality is and what bad quality means in terms of time, effort, opportunity losses on the producer side will help coming up with these checks.

I think this requires also a collaboration between data producers and consumers to start talking about like, how do you bring some of these common checks and standards into data quality?

Adel Nehme: That's really great, and I want to kind of double down on that, right, because you alluded to this earlier on, you know, how the disconnect between data consumers and data producers is leading to a lot of dissatisfactory results when it comes to data quality, and how shifting left here can be solution to it.

us through in a bit more detail the operational requirements for a, successful shift left program, and what is a healthy relationship between a data consumer and data producer look like in this paradigm?

Saraubh Gupta: So there's a new emergent role in the industry, in the data industry. I won't call it very new, but it is still new and a lot of organizations are still trying to frame and see how they can take advantage of this. It's called a data product code. Now, I think first we have to see is not see data as a data set.

We should start looking at data as a product. What it means is like bring a product thinking around how data is being managed. And when we say product thinking, the immediate reaction should be is okay, I should know how this data is being used. now how do we bring like if you see data producers, it's a very different role versus data consumers.

Because data producers are more on the transactional side, their focus has, will always be how to get the operational gears turning all the time consistently. Now, they produce this data, they turn it into a, these are extracted into data lakes, data warehouses. And consumers are seeing it in the, okay, I'm using this data.

It doesn't make sense. Now, this role of a data product manager is something which is interesting because he or she is responsible for bridging the gap between how data is, what is getting produced and how it is getting used. So if a data product manager is responsible for understanding the use cases, along with the use cases, what are the quality aspects that come with it?

What are the transformational aspects come with it? And by transformational is like, if there are some calculations which our data consumer is doing every time, probably that also could be moved left, you know, so that, how that made the matrix behind that calculation can also be standardized. and that's where I think some of the Policies and processes which are governed by a data governance program comes into.

So if someone is building a data product, how, what they do, how it is getting used, can any of these things be standardized, should be the responsibility of data product manager, the data steward. And he or she will be responsible for getting those implemented. On the left side, when I say it's at the point of data extraction from the transactional systems,

all those checks and balances, the calculations, the transformation, if they are put there, then as the data flows through the pipe all the way to the consumer, the quality will be better. It'll be, calculations will be consistent, and as the data gets used or shared through different channels, whether it's a Tableau dashboard or a Power BI dashboard or an API, it remains consistent.

Adel Nehme: I love that analogy or like kind of that role that you're discussing as a data as a product manager, right? Because here we're looking at, if you want to do the analogy of a regular product manager who's in charge of like a digital product, the product manager guides the developers, software engineers, so here data producers, to make sure that they're creating the best possible features for, users of their, of that product, which is here the data consumer.

So very great analogy here and how that fits into that role. you mentioned how this connects to the data governance program. In this case, who owns the data governance program? And does the data product manager, you know, take inspiration for us already there?

Or is like, walk me through kind of that, that interlock and how it works.

Saraubh Gupta: The data governance program, I think, is best if it's kept a little separate, but with an, it's a parallel track, because data governance program, Typically, people talk about people, process, and technology, right? I love to add another one, people, process, policy, and technology. the reason I want to bring in the policy element into the mix is because People and processes is fine, but the policies are the one which remain.

People change, the processes can change. the policies are the ones which talk about standards, common calculations, they remain. So, with that said, like, what I have seen successfully working in most of the organizations to have a centralized role, there are different models that can be centralized, hybrid, it could be decentralized, it depends on the size of the organization, but importance of a particular role, who owns the data governance program, what he or she is going to be responsible for is building or managing a framework for data governance.

along with the policies, the whole nine yards around data governance from a process point of view. Now, any data initiative that kicks in, that particular team will have to have this person with an oversight response. And with oversight, what it comes is, like, say, if someone just finds, like, I'm building a new data product for this purpose.

As a data governance person, I should be able to say, is it discoverable? Is it actionable? Is it secure? Etc. So all the checks. when we talk, is, is the quality being checked? If it is checked, what are the standards being followed? when the data is going to be published in a catalog, are there common metadata standards being followed?

So all the Aspects of turning data product into actionable, secure, governed. is my responsibility as a governance person. It's not that I'm going to do it, but what I do is I work with the data products team or the data engineers team to make sure we bring this into the core development process so that some of the action levels could be automated.

It could become a part of the thinking process from the very beginning. so the message which I'm trying to say here is bring data governance into action from the very onset, rather than an afterthought. So when you bring it in as a from the very onset, then some of the capabilities to add are much simpler than going back and doing it.

Adel Nehme: You know, for organizations that don't necessarily have that shift left mentality, And I think there's probably a lot of data governance leads listening in right now being like, you know, I wish we can shift left within my organization. Maybe what would be your advice for how to kickstart that shift left mentality and how to approach it?

Saraubh Gupta: It's a very good question, Adil. And this is one area which All organizations are struggling with because as a user of data, as a producer of data, I want to just produce, I want to just use, I don't want anything that slows me down in the process of doing. thing which I've been suggesting for several years and I've been practicing myself is, an organization which has matured data, use the data governance is not as mature. Work closely with the data consumers and understand and sort of, in a way, document some of the challenges and pains they are having. Turn it into a cost that the organization is occurring or the opportunity cost an organization is facing because of this issue. Now, it's very easy to say, but I think it requires time, it requires energy, and it requires a partner in the organization.

crime on the consumer side, business side, who's ready to do that. So, I think if done that, I think the opportunity becomes much, much, much easier or the problem becomes easier to solve. so, a data governance role should be seen as a collaborator rather than a enforcer. And I think that's one shift which everyone should do.

A lot of data governance practitioners face this issue that there's already a train in motion, how do we do it? And again, my two cents is going to be start small. And it's important to start small because if you try to make a lot of changes which impacts a lot of people, then the pushback. Start small, establish success, and partner with the other person to jointly demonstrate it across the organization.

While you do that, support from executive leadership is also going to come into play.

Adel Nehme: Yeah, definitely useful to always have executive leadership support. But you know, what you're talking about here actually segues really perfectly to my next question, you know, because I wanted to shift gears a bit and discuss what are the key tenets of a successful data governance programs. You mentioned people, policy, process and tech, right?

And. You also mentioned here the importance of starting small and kind of scaling from there and not try to push a moving train off of its tracks, right? And be a collaborator as a data governance program manager, maybe first walk us through what you think are the key successful dimensions of a successful data governance program and what does it mean to operationalize successful data governance from a people policy process and tech perspective?

Saraubh Gupta: The co tenants you mentioned, for me, like it's always people, process, policy, and technology. The reason I always put technology on the last is I think that's the easiest problem to solve. People are the hardest because it not only touches on their day to day way of doing things, but also a lot to do with learning behavioral aspects around that.

So people is most important. So awareness, data literacy becomes a part of this whole process. process. So people aspects very important. Skills are important. Awareness is important. Get them a lot. I miss making them aware of how challenges around data is impacting the organization or but good practices are going to help better deliver value for the organization.

I think those are the aspects which need to. There's always a fear around change. So how do you bring a smooth change management around? that's important to bring people along. So that's one. I think the second part comes in processes. Processes over time get very, very convoluted. I have a certain way of doing things.

I did it. Then someone else comes in and he or she thinks, oh, this change has, this process needs to be adjusted to meet a specific other need. Band Aids. So over the years, Band Aids after Band Aids have been put in. That means I have seen in a particular organization where Reports go out, and then someone does a quality check on data, because Someone from the top leadership said how data quality is important.

So then, before even, understanding the role of data quality, consumers are receiving the data, then the data quality. So I think this is like what happens is when organically processes are sort of changed and tried to meet, either asked from the top or there's a need for adjustment. So I think simplification of processes are very important.

Try with three steps processes to five, but the moment you go to seven steps, something is wrong. Go back and check. So simplification of process is very important, I would call it. The policy aspects are, again, like how to look at data, how to look at data processes, how to look at people, like in terms of data literacy, awareness, what are the policies Can these policies be phrased in single sentences?

which can be easily understood and applied. So I can give you an example, like, data quality rules have to be enforced or implemented on the producer side. Very simple sentence. Then it can be sort of detailed into what it means. It could be very specific to a data product. Now, if this as a policy is understood well, then what happens is, like, as a data product, team or a data product manager.

When I start building my data product, I go to that policy and say, okay, what does data quality mean for my data set? And I define, okay, these are the ways this framework says, when you look at data quality, these are the parameters you need to check and look at those parameters and say, in context of the data product I am building, these are the ones which make sense.

And let's apply. So what happens is policy gets abstracted at a organization level, but gets implemented at a process or a data product level. And what it also does is it creates check and balance that before the data product goes live, someone is going to check. And if it's not implemented, well, there's a rework needed.

Another example could be is like, say, when a data product is getting published in a data catalog, these are the 15 metadata fields it should have. What is the purpose of the house? What is the frequency of updates? Who is the owner? Etc, etc, etc. Now, if I build a data product in which of these 50 only 5 are published, it should not be even published.

This is where the policy is the one which puts a check on that.

Adel Nehme: Yeah, and in a lot of ways what you're discussing is really similar to, how software teams approach, you know, unit testing and continuous integration and continuous delivery, but applying that mindset to data its heart. So maybe you want to expand a bit more on that simile and kind of that metaphor and how it applies in a bit more detail.

Saraubh Gupta: Anything which is repeated, anything which is done over and over again. I think there's an opportunity to automate. So if you see from a software development point of view, a lot of small pieces are built together. They are tested and they're put together. So it's unit testing versus integration testing in the world, So I think this in the same vein, data products, when I'm being built, they're being, there are multiple pieces to it. The data ingestion pipe, the transformation routine, data quality checks, the metadata, along with that, and publishing. I think all these can be done individually, but when they are being published, be a check to do that.

And strongly believed that if there are standards in place, a lot of it can be automated. means is like, then it becomes less of an overhead. But more of a best practice in place, which is happening, you know, I'll give you one very interesting example, which used to just annoy me a lot.

what people will do is like they'll take the data in an Excel sheet, and then they'll eyeball it to figure it out where the problem is. So people do that. Okay. Now the challenge with doing that is I may catch it. I may not catch it and it can go. But on the contrary, you get a thousand row Excel sheet.

If automatically as a part of the process, three rows are highlighted, then these have problems. It simplifies the problem a lot, right? In place of looking at hundreds of Excel sheets, thousands of rows in each, I'm looking at three Excel sheets with two rows that have a problem. Very simple. I think once you establish this as an automated process, credible, tested, I think everyone would love to use it.

But till you don't establish, I think there's a problem.

Adel Nehme: Yeah. And you can, you know, once you scale that out to hundreds, if not thousands of, data pipelines, you can really see the potential benefit of establishing these best practice and processes. Now you mentioned SRUB as well, like that people is definitely the hardest element to crack and in this framework, right.

That's people, policy, process, and technology. couldn't agree more like the data frame is really big on the people agenda when it comes to data. So maybe. Walk us through some best practices that you found when it comes to moving the needle on the people component and totally appreciate that moving the needle here takes months and a year plus years here rather than weeks.

so maybe walk us through what you've seen works really well when it comes to shifting minds on on the strategic value of data within the organization.

Saraubh Gupta: I think the people element has few triggers. One is the urgency and urgency and importance, I would call it number one. That's the one which has to be created top down. Like when senior leaders in an organization, they start talking about this, they start demonstrating how they are using this as a part of their day to day jobs.

I think it's, it's, it gets easier for adoption. So we can call it an executive sponsorship, buy in from executive leaders, and having sort of making it a part of your day to day conversation, communication becomes a very important. So the moment this happens, so first ownership shifts, second is it becomes a part of the communication.

Now, as a part of the overall organization, when people keep seeing or hearing this more and more, it's starts triggering. Okay, what's happening? Why this is becoming important. Now, in one particular organization, like, the CEO in a board meeting will never accept any data. And this is substantiated by some data set in the organization.

So no one can say that this is what is happening. Is there a data point? So I think when data becomes a part of communication, the awareness automatically starts building. So now this is from the top down. Now let's see from the bottoms up. Data has been always managed by experts, be it data engineers, be it data scientists.

On the contrary, data consumers in many organizations, or most of the organizations, are still consumers of Excel sheets or reports. So there's a little bit of a fear like if you ship these reports from Excel to a self serve, requires skilling try to understand to use this data, how to use it.

So I think we need to make that process very smooth. There has to be awareness built, the importance, the need, and also training programs around that. This is how you shift how you use data. Once you do that, I think still there's a resistance, there's a fear. We need to figure out like, who are the early adopters.

Get them into action and in place of the team who's driving change, the early adopters, when they start talking to their peers, they start sharing it. Then it becomes a much easier thing. And I would say like there's always the last 10 percent late adopters. Yes, they will come, they'll have to be dragged along, they'll come.

In one particular organization, which I was working with, there was a very interesting practice that every time a new data product or a data initiative is completed, the showcase is done, they had brown bags where every three consecutive weeks, every Wednesday, there'll be an hour and a half block where the team business owner would The tech team worked on it.

They come and they demonstrate what they did, how they did. They showed ways of doing things. And it started with very thin attendance, but over a period of time, more and more people started attending. And to add to it, like, many times, like some of the senior leaders will come in and they say how it helps them.

It started creating this momentum and once it picks pace, right, the snowball effect kicks in or more and more people start kicking in. So that said, like this, this is not a day's job, a week's job, month's job. It can take sometimes over a year, couple of years. There's another challenge which we also need to be aware of is by the time an organized, sizable organization goes through an adoption process, there's still another change already started.

Right. So the first change is hard, but I think subsequent changes can become faster and faster if you put it as a part of the formal change management process.

Adel Nehme: that's wonderful. And I couldn't agree more on the importance of, building an ambassador community for your data literacy program, you know, as you mentioned here, as a means to accelerate the word of mouth and the excitement. One thing that I've always, been curious about is the role of the data governance lead and driving the data literacy agenda, because there's probably, Out of the top five profiles within the organization that have a vested interest in increasing the data literacy of their organization, the data governance lead probably has to be in those top five profiles, But they're not formally linked either to the formation or the carrying out of the program. Usually that's done by the learning team. The CDO usually is the executive sponsor of the data literacy program. So maybe walk me through the role of the data governance lead in building an organization's data literacy.

Saraubh Gupta: I would say there are a couple of areas where the data governance role becomes a very, very pivotal in terms of the data literacy program. So, one is, I think, having a standard around this training, this adoption is also key. And nothing like a data governance person being a part of this whole discussion.

That said, I think, as a part of data literacy, like what policies are, what standards are, are very important. And that part of communication should be led by a data governance person. The importance of it, why it is needed. So take for example, in a, banking or a financial industry where regulatory compliance is pretty high. the standards which are put in place by the data governance site has to be communicated to a lot of stakeholders. They're not just the end users, but also teams which are managing processes, teams which are managing consumers, right? In those organizations, literacy is at multiple levels.

It's executive level, it's at teams which are managing, processing, maintaining data, and the last level is people who are consuming data. So, any sort of communication, training, change management to be planned should be done in close partnership with the data governance. And they will bring in aspects which are really, really specifically important for people at different levels.

levels in the organization. So to

Adel Nehme: Okay, that's really great, Saurabh. I couldn't agree more as well. And maybe to connect back to one thing that we discussed earlier in our conversation, was how a big aspect of, you know, succeeding in a data governance program is, you know, ensuring that we're starting small and that we're not, taking a moving train off of its track, right?

So maybe walk us through quick wins organizations can have on the data governance side. And especially how it connects maybe to quick wins on the shifting left agenda as well as as we discussed

Saraubh Gupta: in the business community. Once I have identified, what I'd like to do is identify a use case where that particular group of business users are not only having the most challenges, but also it impacts their day to day working. So take for example, if I have a report which needs to be sent to the executive leadership team every day, And I don't trust the number and my team spends an hour every day to check that data before they can send it to the executive leadership team.

And say, let's just, for number's sake, I have eight people in the team, they're spending one hour each, I'm spending one day a week just doing that. It's a painful process. It lacks trust. But I, I would love to partner with this particular group and say, me extract what you check, how you check. And let me turn it into something automated.

You don't need to use it, but let's try it. We extract that in knowledge. Turn it into standards. Implement it in a way that it's automated. walk with a user team or this team, the whole journey of saying, okay, now you check. So they check all of these other issues and you bring out an automated report, you see, go through this process and show them how this makes their life easy. Then what happens? It suddenly turns into. this group of business users start championing your cause of data governance, then you pushing it everywhere.

Again, I think starting small is critical because data quality, since we are talking about quality, can be like a rabbit hole in itself. There could be exceptions, there could be very, very specific things. So in case of that, I think start with something which is very simple. Maybe 15 percent of the checks, 20 percent of the checks what you need.

Establish credibility and over a period of time as things are needed, add more. One of the challenges like any other product development is 80 percent of the product development can be done in 20 percent of the time. But the rest 20 will take a lot more. And I follow the same rules with data. whether it's data quality, whether it is reporting.

Start something which can be immediately used, make a credible example, and then keep adding as more requirements. Basically, iterative development as you in data space.

Adel Nehme: Okay, that is awesome, Saurabh. So Saurabh, maybe as we wrap up our conversation, I'm Maybe what are the key trends that you see in the data quality and data governance space this year? How do you see the shifting left conversation evolving this year?

Saraubh Gupta: I think there's a lot of awareness that has built, got developed over the last couple of years. Data owners, data stewards have started sort of, understanding, agreeing to this, They started talking about data quality as an important piece of the whole data life cycle. And to add to it, I think Gen AI has done to the next level all together.

For Gen AI to work better, to deliver you better results, you need better data foundation. And I think that is, adding to the importance of how data should be managed, how data should be governed, what are the ethics around it, how data quality is seen more and more. I would say this is just starting and a lot of tools have also started coming in.

But I think more than the tools, the process part, the people part, the acceptance and understanding of the importance is the key, which is happening.

Adel Nehme: Yeah, and we definitely need to bring you back on DataFrame, Saurabh, to discuss the data quality and data governance angle of generative AI. But as we wrap up, Saurabh, do you have any, maybe final call to action or closing notes to share with the audience?

Saraubh Gupta: I have just one message for the data governance, data strategy community here, is don't try to, to do big bank change. Take it small. Adoption is a key. Having a right partner is the most important thing. And once you establish a credible success, I think the snowball will become bigger and bigger very fast. the key is collaboration and small. Start small.

Adel Nehme: Yeah, I couldn't agree more. Thank you so much, Saurabh, for coming on DataFramed.

Saraubh Gupta: Thank you, Adil, for having me. This is great.

Topics

Data Literacy

Artificial Intelligence (AI)

You’re invited! Join us for Radar: AI Edition

Join us for two days of events sharing best practices from thought leaders in the AI space

DataCamp Team

2 min

[Radar Recap] The Art of Data Storytelling: Driving Impact with Analytics with Brent Dykes, Lea Pica and Andy Cotgreave

Brent, Lea and Andy shed light on the art of blending analytics with storytelling, a key to making data-driven insights both understandable and influential within any organization.

Richie Cotton

40 min

The Art of Prompt Engineering with Alex Banks, Founder and Educator, Sunday Signal

Alex and Adel cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and much more.

Adel Nehme

44 min

The Future of Programming with Kyle Daigle, COO at GitHub

Adel and Kyle explore Kyle’s journey into development and AI, how he became the COO at GitHub, GitHub’s approach to AI, the impact of CoPilot on software development and much more.

Adel Nehme

48 min

Serving an LLM Application as an API Endpoint using FastAPI in Python

Unlock the power of Large Language Models (LLMs) in your applications with our latest blog on "Serving LLM Application as an API Endpoint Using FastAPI in Python." LLMs like GPT, Claude, and LLaMA are revolutionizing chatbots, content creation, and many more use-cases. Discover how APIs act as crucial bridges, enabling seamless integration of sophisticated language understanding and generation features into your projects.

Moez Ali

How to Improve RAG Performance: 5 Key Techniques with Examples

Explore different approaches to enhance RAG systems: Chunking, Reranking, and Query Transformations.

Eugenia Anello

See More See More