Skip to main content

Governing Data Models with Sarah Levy, CEO and Co-Founder at Euno

Richie and Sarah explore the challenges of data governance, the role of semantic layers in ensuring data trust, the emergence of analytics engineers, the integration of AI in data processes, and much more.
Dec 12, 2024

Sarah Levy's photo
Guest
Sarah Levy
LinkedIn

Sarah Levy is a seasoned executive with extensive experience in data science, artificial intelligence, and technology leadership. Currently serving as Co-Founder and CEO of Euno since January 2023, Sarah has previously held significant positions, including VP of Data Science and Data Analytics for Real Estate at Pagaya and CTO at Sight Diagnostics, where innovative advancements in blood testing were achieved. With a strong foundation in research and development from roles at Sight Diagnostics and Natural Intelligence, as well as a robust background in cyber security gained from tenure at the IDF, Sarah has consistently driven impactful decision-making and technological advancements throughout their career. Academic credentials include a Master's degree in Condensed Matter Physics from the Weizmann Institute of Science and a Bachelor's degree in Mathematics and Physics from The Hebrew University of Jerusalem.


Richie Cotton's photo
Host
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Key Quotes

If you want to adopt AI tools, if you want to have a ChatGPT for analytics and ask, you know, what's the number of daily active users, what's the interest in the past month? And you actually want to trust the number that this tool gives you, you have to rely on some sort of truth, some certified set of definitions to train those models.

You invest so much money in the data stack and building data pipelines, data engineering teams, a warehouse, BI tools, all of that just to make data-driven decisions. And in the end, you don't trust the numbers. I think that's the biggest issue in data governance right now.

Key Takeaways

1

Implement a semantic layer to create a single source of truth for your data, ensuring consistency and trust in your metrics across the organization.

2

Utilize governance scores to measure the effectiveness of your semantic layer, focusing on the percentage of dashboards and queries that rely on governed metrics.

3

Leverage observability tools to track metric utilization and identify which metrics should be promoted to the universal semantic layer, maintaining a balance between innovation and governance.

Links From The Show

Transcript

Richie Cotton: Hi, Sarah. Welcome to the show.

Sarah Levy: Hi, Richie. Happy to be here.

Richie Cotton: Cool. So, to begin with, just talk me through, what are the big challenges in data governance right now?

Sarah Levy: Wow. So I think the biggest challenge is that so many business leaders cannot trust the numbers that their data products report, right? I think that's the biggest. You invest so much money in the data stack and building data pipelines, the engineering teams, you know, a warehouse, B I tools, all of that just to make data driven decisions.

And in the end, you don't trust the numbers. I think that's the biggest issue right now, if you ask me.

Richie Cotton: Sure, yeah, so I can see how this is a big problem if you are spending all this money on your data solutions and then you go, well, actually, I don't really trust the answer at all, then it's a complete waste of time. So I know one of the solutions you're sort of interested in, in order to get better trust in data is the use of a semantic layer.

Can you just talk me through what is a semantic layer?

Sarah Levy: Yes, so it's actually, can consider it a mart or a store where you park all the certified or governed or official definitions of your calculations. For example, if I led a real estate department in a huge fintech company, and one of the major KPIs was the number of assets that we own. So number of assets can be calculated from various systems in different ways.

... See more

ir="ltr"> and we had actually experienced, I experienced it myself. We had about 300 assets that we managed. And the range, the numbers that we got, the number of total assets from different systems ranged between 270 to 320. That's a huge mistake. So a semantic layer is sort of, mark where you will have the official definition of total number of assets.

And if I want to know it, I will use the semantic layer to get the right context from the data. So there's lots of data out there in tables. And if you want to get the right context for this data, You use a semantic layer. It provides you the context for that data so that you can know which table you need to query to get the answer that you want.

Richie Cotton: Okay, so the idea is that you've got an official definition of important metrics that you need to calculate. Okay, I like that. And I guess if you don't care like 10 percent either way, then it maybe matters less on like how many assets you have. Like you said, sort of 270 to 320. It's like, if that's, if that's good, about 300 is good enough, then you maybe care less.

But if you want the exact answer, then you need an official definition. So talk me through like, are there any other benefits beyond just having a single source of truth? Like, why would you want this semantic layer?

Sarah Levy: actually, right now, almost every BI tool has a semantic layer. It's not a new concept. If you use Tableau, you build metrics in Tableau. You build them in workbooks. You can build them in data sources. If you use Looker, you have semantics in LookML. You have the equivalent in almost every BI tool.

So, it's like the bread and butter of analysis. to create calculations to create definitions to, define new terms as they go. So every data system already had lots of semantic definitions. This is what captures the business logic. The reason you want a semantic layer is because Quite often, there are lots of duplicated and inconsistent definitions.

Many of the definitions are siloed or trapped, you know, in some analyst's workbook or, a spreadsheet or something. So if you want to reach consistency and alignment across an organization, you want everyone to speak the same language. You want to build this central source of truth, this semantic layer where you have the official definition that everyone can trust.

This is certified. This is the right definition for total revenues for daily active users. There might be lots of other copies out there for experiments, for ad hoc analysis, for things that were built and abandoned. But that's where you find the truth. So this is why it's so important. And it's important for two things, because you want to know that the number that you get is the right number.

And when we're facing the future, and I guess we'll touch that, a bit more if you want to adopt AI tools. If you want to have like a chat GPT for analytics and ask, what's the number of daily active users? What's the interest in the past month? And you want to trust the number that this tool gives you.

You have to rely on some source of trust, some certified set of definitions to train those models.

Richie Cotton: Yeah, I can certainly see how if you've got lots of different analysts working on similar problems, they're all going to calculate things in similar, but not quite the same way. If I think I've even done it myself, I've like had to go back and calculate something I knew I calculated last year. And then I've probably done it the same, but maybe not.

So having that single standard definition is going to produce that sort of duplication workload. actually, this leads to a question in like, how do you make sure that you only have one version? Like the implementation seems like it's maybe the hard part, stopping analysts doing all this duplicate work.

Sarah Levy: I mean, you, touch, I think the most important part, I think today almost every organization understands they need a semantic layer, but at the same time. You rarely see well built, managed semantic layers. And the reason is it's actually a hard implementation process. And let me try to, you know, summarize what it consists of.

So first you need to curate. The right metrics. You have thousands of metrics that were built over the years in a large scale, you know, enterprise they're all buried everywhere in BI tools and data applications and data science notebooks. So you need to find maps and understand which ones matter.

Which ones are the most important KPIs? You need to resolve, you know, all inconsistencies and duplications. You need, if you have three versions of something, understand what is the right version that you want to add to this. So after this curation, mapping, you know, understanding which measures capture business value, which, you can delete just to keep the environment.

Clean and, clear. Then you need to code them. All right, that's a big migration process, but once they're coded, what does the workflow look like? So you can not just tell the organization, listen, now we have a semantic layer, these are your definitions. That's what you're going to use from now on.

Stop creating new stuff in your notebooks. That's now your dictionary. And that's what you can use because things change all the time. So a day later, there are already 20 new metrics out there buried in all those BI tools. So if you want to really build and manage a semantic layer, you also need to develop a workflow that takes into account that things will change all the time.

I think these are the main challenges. So curating, creating it, and then building a workflow Keeps maintaining it up to date, consistent as you go.

Richie Cotton: Yeah, I can certainly tell there's a lot of like, subtle sort of process and organizational challenges in there. Maybe we'll back up and say, like, who needs to be involved in creating all these sort of definitions there? It sounds like you're going to need someone technical in order to create this, but also someone with business knowledge as well.

That means different teams working together. So yeah, talking to everyone who needs to be involved in this and what their different roles are.

Sarah Levy: mean, everyone, I mean, things like business logic is created by the business, not some back office of engineers. We'll decide what's the important metric. So the business logic is created. It's evolving. It keeps changing. And this happens on the business side and, you know, analysts, business analysts that work closely with the business, they're usually embedded in business domains.

 that's where the inception. Of new semantics, new logic happens and at the same time, although there were attempts to teach analysts how to code or try to turn them into engineers, there is an engineering effort involved today. When you say governed, you mean coded version control documented tested.

 It's not just you know, I'm not just writing the definition. There is a way to manage this like code so that it's actually governed. So to have version controlled coded metrics, you need engineers to be involved, and they need to understand what analysts want them to code there. Now, there are ways to bridge the gaps a bit better.

Now, especially with AI, you have auto co pilots and auto code generators. I mean, these gaps, just coding things. These gaps will become less and less big, but still you need to design to architect this properly. You need to make sure that the data that those metrics rely on, the transformations, the table, they're also built designed well.

So there is a gap. A huge effort here that combines business as the creation, the creators, analysts as implementing those or writing them using data language and then engineers coding them. So everyone's involved.

Richie Cotton: Okay. So yeah, a lot of different teams there. Does this have an implication that you need some analysts embedded within those business teams in order to be able to sort of write the definition down in a technical way that's come from the business logic? It sounds like you need someone with both data skills business skills.

Sarah Levy: So you need business understanding and you need technical understanding. Now there is a new role. in the data space that was invented, they think by dbt called analytics engineers. And the more, I mean, the more we see how this role evolves and, you know, the people in charge, it, it reminds me of product managers in the software development.

Well, they are kind of, the bridge between the business As a data team, they understand the technicalities of data. They can write and code things like engineers, but they're also closer to the business. They work closely with the business. So analytics engineers. I think they became officially the owners of the semantic layer.

They're the one that build it, that maintain it. And they should be able to manage this conversation.

Richie Cotton: Okay, that's very cool. And it seems like analytics engineer is one of these sort of hot new data roles.

Sarah Levy: Yeah.

Richie Cotton: um, Yeah, can you maybe get more into depth on like, how does one become analytics engineer? What sort of skills do you need to do this role?

Sarah Levy: I've interviewed in the past year over 300 analytics engineers. I mean, I really spoke to so many. And there are different stories. Sometimes it's engineers that really express interest in the business and they're keen to, you know, see the impact of their work. So and you see the same thing with product managers.

This is why I like comparison. So it's sometimes it's, it's engineers that have a very strong business understanding and interest and they can speak to business people and they become engineers. But I think more often it's analysts. That wants to, you know, skill up and become engineers and it's like a natural pass from the world of analytics to the world of data engineering.

Like it goes through analytics engineering. So I've seen both things.

Richie Cotton: Interesting. So, you get some very business focused people, but also there are some where they're kind of halfway in between a data engineer and a data analyst. All right. And so, is the role of this job, is it basically just like grinding out lots of metrics all the time? Is it like vast amounts of just creating these definitions for how the business wants to run or is there more to it?

Sarah Levy: So in fact, they're kind of the ones that really understand the importance of business logic governance. And in a way you could say they own business logic governance. It started with just modeling in DBT, so writing the DBT transformations instead of having joins and, you know, computed columns built in Tableau.

They do it in DBT, and they're the ones coding these things in DBT. And then, you know, the natural next step would be, I mean, transformation, sculpture, logic, semantics, metrics, sculpture, logic. So they're coding this. Like every engineering role, it's beyond coding. It's really architecting it the right way.

It's really understanding how you build the processes and the workflows, how you determine that something's a duplicate. How do you know which one are the certified things, which are not? How do you design the system? And many of them are actually pretty senior in their skill set, in their impact, I mean, the level of impact that they have.

They can sometimes work with like 20 engineers on the data platform, hundreds of engineers in the business side, and there are like four or five analytics engineers that really design the whole interface.

Richie Cotton: so you mentioned tools like DBT, and I guess there's a lot of SQL in the background, and BI tools like Tableau, whatever. So, beyond that, I mean, because generative AI is sort of working its way into absolutely everything, is there an AI angle here? Is that changing how the analytics engineer role works?

Sarah Levy: everything that will impact semantic layers and governance will change how analytics engineers work or their role. I think, I mean, AI maybe five years ago, I mean, the companies that introduced semantic layers and, you know, data visionaries, everyone said, like, semantic layers are important. If you want to trust your data, if you want to have this source of truth, you need to build a semantic layer.

I think with AI, this becomes Clear without a semantic layer, it's not going to work if you build a central governed semantic layer, it might work and you need to do it the right way. So I think that's where analytics engineers will become, you know, analytics engineers and also, you know, the data leadership and how it owns this and builds the roadmap for that.

But there will be that the ones implementing this, and I think. Good or well performing data teams will make AI work and others will fail. And the question is whether they will be able to manage a centrally governed semantic layer.

Richie Cotton: Okay. So, I guess, uh, since some people are going to succeed, some people are going to fail we need to figure out how to be in the success group. Uh, Maybe just for some motivation, do you have any examples of companies where they've built the semantic layer, they've seen some good results like this?

Talk me through some case studies.

Sarah Levy: I think it's still in the early stages. I'm working with a big customer that started working with the semantic layer in dbt early on. They built all the metrics there. They really have a source of trust for metrics. They let analysts create things in Looker. So they have their playground, they have where they do things.

But as things become mature, I think a very big company, like 000 people, hundreds of analysts, and there are like a dozen of analytics engineers that really managed to centralize the metrics for each business domain. And their data is really contributing lots of value to their decisions, to the business decisions.

All their business rely heavily on data. It's a big European unicorn company micro mobility company. So I've just seen this. And they were one, I mean, first adopters of the dbt semantic layer.

Richie Cotton: I guess the tricky part is going to be measuring success. Like, what constitutes success? It sounds like there's some productivity benefits from not duplicating work, and there's some more nebulous things about making less stupid decisions because the numbers are wrong. Can you talk me through, like, how do you go, we've implemented a semantic layer, this is how we know it's successful?

Sarah Levy: if you invest in governing your data model or your business logic and semantics. And I would say also transformation. So tables fact tables this altogether captures your governed business logic. So we actually introduced something we call governance score. If you could say for an organization, for example, what percentage of their dashboards rely on governed metrics, meaning metrics that are in the official semantic layer or governed tables, tables that are coded in dbt.

What percentage of the queries are from governed resources? This gives you already, you know, a first indication of how well you can trust the results there because they don't rely just on any joint that someone did in an external table with a row table with the CSB. But on actually, you know, Okay. Data that is version control that is coded that is governed.

So that's one way and and and we took this concept of governance score like the simplest way and expanded it to more sophisticated governance insights. So what's the duplication? School, how many duplicates you have in your official semantic layer? Is this like zero duplications or close to like 20 percent of it is duplicated.

30 percent is duplicated. How well is it's documented? What percentage of your metrics are poorly documented or well documented? And then you can, you can think where we can take it. So if you use those like governance scores. You can actually see how close you are to actually using governed controlled logic and not just anything that someone creates.

So I think this will become more and more useful.

Richie Cotton: Okay, so I like the idea of just tracking, like, what percentage of your metrics are actually governed and what percentage just sort of ad hoc analyses. I guess, This has a sort of knock on implication that you want to just gradually start shifting things to a governed approach. So where do you start?

Is there a specific order? Like, should you start with like one area of the business? Should you do like a few metrics from every area of the business? What's the plan for actually getting all your data government?

Sarah Levy: I needed to like say it like in simple words, I would say where the business value lies, right? I would want to start with the things that bring most, the highest value to the business. that's where you want to start. So many practices, All right, let's start with the main KPIs and pick the like the most important business domains where all this focus is, but you can actually use and that's that's also something that we introduced.

You can use utilization as a very strong indicator to value the measures that are currently used to dashboards that are currently used to people actually watch them and use them and refresh them. That's where the business value is right now from all the data assets that you built. Let's start there.

Let's make sure that the highly used Data assets, data products are governed and then, you know, you can prioritize based on that. That's a very strong indicator for value and then you have cost where you spend money. If you spend a lot of money on these measures, on these tables, maybe you want to go there because you want to make sure you spend money on the right things.

Richie Cotton: I love that idea of using utilization to see where value lies in your datasets. Because obviously, like, so many dashboards, it was like, I created it, and then maybe someone looks at it. Maybe they don't, but some, it's like, well, yeah, okay, the C suite's looking at this like every day just to track something important.

That's obviously going to be a much higher value.

Sarah Levy: can share with you like a statistic, like almost every customer. I'm working with over 50 percent of the dashboard has zero utilization in the past two, three months over 50%. That's that it's crazy. It's like you, you have all those dashboards that usually sit on extracted tables and you waste money on that and everyone gets lost there.

And it's not even used by anyone. So, yeah, it's super important.

Richie Cotton: Absolutely, yeah, so tracking that utilization does seem an important facet of governance. Okay, so I'm wondering, since the big point of the semantic layer is to reduce the amount of chaos, so you're not having to track individual data sets, you're just tracking metrics. As this scales, do you have the problem that you've then got to track all the metrics you've just created?

Sarah Levy: I mean, my personal take on that, and there are different pieces, but my thesis is that every data application and every business intelligence tool will have its own, local semantic layer. You will have the Looker semantic layer, the Tableau semantic layer, the Hex semantic layer, the Sing semantic layer, and we can go on and on and on.

And it will be like the place where things are created fast, they analyze and stored locally if you want to experiment with something or try something. And then there will be a sort of shift from the local semantic layer to the universal semantic layer. And this is something that is consistent and aligned and across every data application and all data users.

 And to do that, you have to own very powerful observability and mapping tools. You have to see what's created everywhere. You cannot just expect analysts to say, well, you know, this is an important measure. Let's open a ticket for the analytics engineers that maintain the universal semantic layer to add that.

They will create it in their tool, add this to a dashboard. This dashboard will gain traction and no one will bother because no one has time, right? Everyone's working so hard to deliver data products on time. So you have to own and obtain, you know, powerful observability tools that map everything that exists, that identify duplicates, that indicate, this and this and this, this should go to the universal semantic layer, that it's time, shift them.

They're already highly used, they're still trapped. You want to add them, you want to align them with everyone. So, and on top of this powerful observability capability, you can build a workflow.

Richie Cotton: Okay. All right. So I guess the intermediate stage from like everything's in this universal sort of metric store, you've got this sort of local store, so you're dealing with like one department at a time, and then gradually you can sort of shift them to this sort of central place. Okay. The next thing is maintenance.

So once you've created these metrics, I know the business is always like, well, you know, are we calculating this in the right way? And you're going to want to update the metrics. then you've got, I guess, multiple versions of like how you calculate, I don't know like your customer lifetime value or something, or your customer acquisition cost.

And I guess you want the new version, but also you want the old version just for consistency of like previous reporting. How do you deal with multiple versions of metrics?

Sarah Levy: So, thank God we've got Git, right? Introduced into data, finally. So, I think, I mean, today, every, same as you manage dbt transformations in the warehouse, in dbt, in a Git repo, You do the same for metrics. It's managed like code, with version control, you can, roll back, you always test new versions and you run regressions and everything you do with code, now you can do with metrics.

And you know, each report, which version of the metric it uses, it's part of the system. It has to be. Otherwise, you mentioned, you will create a duplicate whenever you want to change and a duplicated dashboard. And, again, this chaos is formed, you know, just like that. So,

Richie Cotton: Okay, that seems to make sense that as long as you capture all your business logic in code, then you've got access to Git and other version control tools. And that way you can manage things, maintenance just happens using natural software driven life cycles. Okay. So I guess the other thing is just about who in leadership needs to be involved in this.

Because you've got data teams, you've got business teams. Should managing all this, is this the responsibility of the chief data officer? Is it your chief revenue officer? Is it someone from IT? Like, who needs to take charge of this?

Sarah Levy: so it's obviously depending on the scale, on the scale of the organization, right? Most organizations smaller organizations usually don't even have a CDO role. They will have like VP data of analytics in a good case, often it's like director, Level director of data platform, director of data analytics.

So they usually own the implementation, but then I think it's split because those data folks and the leadership of the data, I think they understand quite well why you want to build a semantic layer that you need to govern. They. They have a pretty hard education role to educate the business leadership, how semantic layers, what they have to do with AI and with the pace at which they're getting reports and why they cannot trust the numbers.

And it's a pretty difficult role to educate everyone on that. So they usually are the champions. I mean, they buy the tools, they implement them, they own them, but they need to get the business buy into that. And it's on them to teach them and we help them, but to teach them why. Why those things are related in the first place.

Richie Cotton: yeah, I can certainly see how there's going to be this big education component to make sure you've got data people and business people talking to each other productively. Okay, so you mentioned that for smaller businesses, you're not going to have this chief data officer role. And so I'm now wondering, Is there a difference in how you go about implementing this if you're a small business versus a large enterprise?

Sarah Levy: I've been speaking and working with organizations of like 200 people, organizations, 1, 000 people, organizations, 10, 000 people, organizations. I think that, I mean, what changes entirely the level of chaos. When there are a hundred people, five, six data people, they just know everything by heart.

They can talk to each other. They know where to find things. They know which metrics exist and how they were defined and when and by whom. It's just easy. It's still solvable without all those tools. they might even say, well, we don't need a semantic layer. We just don't have. Conflicts, we don't have duplicates, we don't have all of that.

We control it, we manage it so well. And then there is like a phase transition. When the number of data practitioners exceeds like 25 people, then you lose control. And often if you don't build it right from the beginning, you need to, then you start re platforming, migrating, changing everything. Almost every data team is in some sort of re platforming.

Project now that most popular I hear about is Replatforming to governance, everything was about democratization, access, access, democratization. Now we replatform to embrace governance. If you don't do it, when you're small, you read platform. I think as you become large, it's just the pace at which new things are created.

And, the amount of logic that you already have across the business domain is something that you cannot, cannot control just by aligning everyone and speaking with everyone. And that's where it become critical.

Richie Cotton: Okay, yeah, so it sounds like one of the main benefits to this then is that it allows you to scale your data teams, it allows you to scale the usage of data because there is just less chaos and you need to worry about you don't to spend more time worrying about consistency because things sort of guaranteed to work that way.

Sarah Levy: maybe the biggest benefit, that's something that I think business leaders will find super relevant and interesting is really AI. Because today they depend on those, dozens of data people to just create a report for them that tells them, you know, how many new revenues were gained in the past quarter based on, you know, territory, campaign business uh, product and so on.

And, this reality of just asking and getting an answer that you can trust. That's a reality that every business leader I think dreams of. It just seems still too far away, but, but this really is what's enabled by building a centrally governed semantic layer.

this dream will no longer be a dream.

Richie Cotton: Ah, okay. So this is really if you want to get to self service analytics and just have that AI chatbot that's going to give you the answers to all your data questions, then you need this semantic layer to be built first. That's cool. So can you talk me through how this all fits together then?

So you've got a generative AI layer, you've got a semantic layer, yeah. Is there anything else that needs to happen in order to realize this self service dream?

Sarah Levy: So I like to draw it like the, the journey to AI. Okay. For this chatbot AI analytics tool. So the first step I think is beginning to build this semantic layer and getting this. cross ecosystem observability of all the metrics that are being created everywhere, whether it's in this semantic layer or in the local semantic layers or in notebooks or whatever.

So you start by getting observability and mapping the utilization. And then if you think about how you train those AI tools, you need to really tell them, you know, this. Metric is certified for your training model. This is not, this is just an experiment. This is just a duplicate. So this is sort of, you can think about it as a layer of governance insights that help you mark that certified.

That's not that certified. That's not. And once he was there, once you have everything mapped. You have a semantic layer, you have this certified labeling mechanism that is smart, it's not just a stupid, manual mechanism, it relies on governance insights, from that point on, and we've run a few pieces on that the tools that we already have in AI will just make it work for the data model level.

So you will be able to ask questions in natural language, like show me the dashboards that report the daily active users and numbers that were used in the past two months by the product department and our government, you can get the exact and once you have that, you can ask any question you want on the data because it knows how to find the right places to query and generate the right query.

So the building blocks are creating the semantic layer, getting observability and utilization so that you can actually build a workflow to manage things, to decide what goes there, what needs to be deleted, what's duplicated and needs to be resolved, to build the tools to do that. And then some governance insights that allow you to tag, this is going to the training model, this is not going there.

From there on, it's almost a plug and play thing.

Richie Cotton: Okay plug and play sounds wonderful. I love it

Sarah Levy: But there's a lot of work until that stage, right? A lot to be done. Yeah.

Richie Cotton: One of the big pushbacks you get whenever you mention data governance is that it's going to stifle innovation. So can you talk me through, like, how do you do data governance in a sensible manner that still encourages innovation?

Sarah Levy: So you can call it innovation, creativity, freedom, and it's a built in challenge because governance usually associated with slowing things down. Creating like, picketing workflows, open a ticket, wait for a priority, wait for things to be built for you, just then start using that. So the problem is clear when you're biased towards governance, everyone experiences friction and bottlenecks and everything's slowing down because the problem is that freedom for analytics and creativity and, innovation is critical.

Because that's how you really solve business questions. You cannot just rely on what exists. You have to get the freedom to build new terms, to create a new analysis as you go, even if 90 percent will be garbage, that's how. Analytics work. this is why I'm emphasizing so much observability piece.

You have to let analysts create things independently, creatively in their native environments, preferred tools, preferred language, at their pace as they like. That's where the magic happens. But, as I said, 90 percent is garbage, and you will see that through the usage. It's not going to be used. They're just going to create it, no one's going to use it.

Create it, try it, it will be local, their notebook. But then, they create a report or a data product and it gets traction. And that's when you understand creation needs to be, you know, added to the semantic layer, you have to maintain this level of creativity. Otherwise, again, you're stuck in place and no one wants that.

Richie Cotton: Okay, so it sounds like you need to distinguish, like, this is an ad hoc analysis on something new with, this is something we need to reuse, like, I guess, analysts shouldn't be allowed to, like, redefine how, the company, like, total revenue is, is generated, like, you need an official definition for that, but if they're just, Claim is something you can that needs to be less governed.

I

Sarah Levy: So let me give you a real world example. So I was working with a customer. They had their definition for engaged user and a user that signed up for the, I mean, signed into the application once a week was defined engaged user. By marketing, by sales, whatever that's. And, and they always track the number of engaged user because, you know, churn usually when the number of engaged user reduces, then eventually it translates to churn and no one wants to experience churn, right?

But then they figured the definition. So the product people, they did, they ran an analysis and they figured that this definition once a week is not a good indicator. It's, it's actually twice every three weeks. That's the, that's a much better indicator. And they did their experiment and they realized that, now think about all the dashboard that rely on the once a week definition.

And now they need to go and, and figure out who is using that. And now we wanna change the terminology and now we wanna use twice every three weeks. And it's the new engagement, you the definition. And this becomes like a nightmare. So they keep it in product and we know where this goes.

So in the world of a semantic layer. They would actually be able to create a new version in this certified place, and they will be able to introduce a new concept, a new company wide concept. And it will not just be buried in their notebooks, but only when they gain, confidence. So you have to get both.

You cannot just take one. limit that. But then once the official definition changes, you have to allow and enable, you know, updates and versioning and all that.

Richie Cotton: really like the story and it just showed that there is some sort of, there's a lot of value that can be got there just as long as you're governing the right things and maybe giving freedom to analysts to do what they want in other places. Alright, so, just to wrap up, what are you most excited about in the world of data governance?

Sarah Levy: So well, am the co founder and CEO of data governance company called, you know, I think after working almost 20 years with data teams in various And I'm in a lot of fields in cyber security, in healthcare, in fintech all trying to get sense from data. I figured there was so many challenges there.

So my mission is to really help large scale data teams understand data easily and get the value that they can from data. So that's, that's why I chose to do that and build this company and try to solve some of the problems that we just touched.

Richie Cotton: Yeah, helping people get value from data is a very worthwhile cause, I think it's a

Sarah Levy: well, let's be more precise helping people or facilitating the creation of this central governance semantic layer and take organizations all the way to AI. That would be the most the more precise way of defining this mission. Yeah.

Richie Cotton: nice. Yeah semantic layers certainly sound very exciting, and I love that it enables that, all that sort of fun Germ Tube AI use case as well. Excellent. Alright thank you so much for your time, Sarah.

Sarah Levy: Thank you. Happy to be here. Thanks for inviting me. Bye bye. 

Topics
Related

podcast

How Data and AI are Changing Data Management with Jamie Lerner, CEO, President, and Chairman at Quantum

Richie and Jamie explore AI in the movie industry, AI in sports, business and scientific research, AI ethics, infrastructure and data management, challenges of working with AI in video, excitement vs fear in AI and much more.
Richie Cotton's photo

Richie Cotton

48 min

podcast

Self-Service Business Intelligence with Sameer Al-Sakran, CEO at Metabase

Richie and Sameer explore self-serve analytics, the evolution of data tools, GenAI vs AI agents, semantic layers, the problem with data-driven culture, encouraging efficiency in data teams, exciting trends in analytics, and much more.
Richie Cotton's photo

Richie Cotton

50 min

podcast

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

RIchie and Libby explore the differences between analytics and business intelligence, generative AI and its implications in analytics, the role of data quality and governance, Alteryx’s AI platform, data skills as a workplace necessity, and more. 
Richie Cotton's photo

Richie Cotton

43 min

podcast

Towards Self-Service Data Engineering with Taylor Brown, Co-Founder and COO at Fivetran

Richie and Taylor explore the biggest challenges in data engineering, how to find the right tools for your data stack, defining the modern data stack, federated data, data fabrics and meshes, AI’s impact on data and much more.
Richie Cotton's photo

Richie Cotton

50 min

podcast

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan focus on strategies for improving data quality, fostering a culture of trust around data, and balancing robust governance with the need for accessible, high-quality data.
Richie Cotton's photo

Richie Cotton

39 min

podcast

The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google Cloud

Richie and Gerrit explore AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, and much more.
Richie Cotton's photo

Richie Cotton

55 min

See MoreSee More