Skip to main content
HomePodcastsData Science

Creating Trust in Data with Data Observability

In this episode, Adel speaks with Barr Moses, CEO, and co-founder of Monte Carlo, on the importance of data quality and how data observability creates trust in data throughout

Jun 2021
Transcript

Photo of Barr Moses
Guest
Barr Moses

Barr Moses is CEO & Co-Founder of Monte Carlo, a data reliability company backed by Accel, GGV, Redpoint, and other top Silicon Valley investors. Previously, she was VP Customer Operations at Gainsight, a management consultant at Bain & Company, and served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science.


Photo of Adel Nehme
Host
Adel Nehme

Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.

Transcript

Adel Nehme: Hello, this is Adel Nehme from DataCamp and welcome to DataFramed, a podcast covering all things data and its impact on organizations across the world. When we say that an organization is data-driven, what did we truly mean? While there are many competing definitions that point to skills, infrastructure, culture, and a lot more factors, we know that data-driven organizations extract value and insights from data at scale. For this to happen, trust in data and data quality are investments organizations need to sustain for the long haul. This is why I'm excited to be speaking with Barr Moses.

Adel Nehme: Barr Moses is the CEO and co-founder of Monte Carlo, a data reliability company backed by Accel, GGV, Redpoint, and other top Silicon Valley investors. Previously, she was VP of Customer Operations at Gainsight, a management consultant at Bain & Company, and served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a bachelor's of science and mathematical and computational science. In this episode, Barr and I talk about her background, the state of data-driven organizations and what it means to be data-driven, the data maturity of organizations, the importance of data quality, what data observability is and why we'll hear more about it in the future. We also cover the state of data infrastructure, data meshes, and more.

Adel Nehme: If you enjoy today's conversation with Barr, make sure to also download our guide to data maturity white paper, which discusses a holisti... See more

c overview of data maturity and the steps organizations can take to scale the value they reap from data. We made sure to include a link in the show notes. If you want to check out previous episodes of the podcasts and show notes, make sure to go to www.datacamp.com/podcast. Barr, welcome. It's great to have you on the show.

Barr Moses: It's great to be here. Thank you so much for having me.

Adel Nehme: I'm excited to be speaking to you about the state of data-driven organizations, data quality, data reliability, and all the cool things Monte Carlo is working on. Before we get started, I'd love if you can walk us through your background and how you got into the data space.

Barr Moses: Yeah, happy to. So I was actually originally born and raised in Israel. I first, I was actually drafted to the Israeli Air Force. And so was a commander of a data analyst unit. I moved to the Bay Area about a decade ago. My background after that is in math and stats. And actually, I thought I was going to go into academia, but my dad is actually a physics professor and so he thought that I was going to follow his footsteps, but ended up having to, it was my big fail my dad moment when I didn't didn't continue, maybe sometime later in the future, but instead actually, went to work for a consulting company called Bain & Co, where I worked mostly with companies actually on their data strategy and operations.

Barr Moses: And later on joined Gainsight, which is a customer success company, where I was very fortunate to be at the company at a time of very fast growth and work with some great people, learn about how to create a new category, Gainsight helped create the customer success category and built a number of different functions among them the customer data and analytics team, where we were responsible for our data internally and also for surfacing our data to our customers and helping them see value from it. And so those were some of my experiences with data and actually how I've encountered the problem of data downtime prior to starting Monte Carlo a few years ago.

Monte Carlo

Adel Nehme: That's great. And how have these experiences led you to founding Monte Carlo? And can you walk us through that journey, please?

Barr Moses: Definitely. So when I was at Gainsight, you mentioned I was responsible for the team, responsible for customer data analytics. And we were starting to get really data-driven as a company, which meant that basically we had a lot more data that we were analyzing authorizing, a lot more data than we were storing and collecting. And a lot of our users were actually depending on this data, among them were our executives, our CEO, actually using the data to make decisions on what new products to launch, which customers to focus on, where we're seeing the most traction, pretty basic questions about the business that you'd want to have answered with data.

Barr Moses: And the person responsible for this data, my experience was I would wake up basically every Monday morning to this barrage of emails, asking me questions about the data, why is the report here wrong? Why does the data look not fresh here? What happened to this graph? It's suddenly all null values. Lots of people just asking confused questions about the data, which led to a fundamental distrust in the data, which was really frustrating to me. I was like, what am I doing wrong? Like, why do we continue to have all these problems?

Barr Moses: And on top of that, it felt like every time that there was a problem, it took us also a very long time to figure out what was the root cause. So for example, if a report was wrong, it could be because the report was not refreshed, it could be because one of the tables in the data warehouse didn't get updated, it could be because one of the third party sources that we were relying on made a change in their API. There could be so many different reasons for why the data was wrong. And it took us a really long time to both learn about it and learn about the root cause and fix it. And I remember that as being a really frustrating experience for myself.

Barr Moses: And then talking to other data organizations and other people in other companies, I recognized that I wasn't the only one experiencing this. In fact, this was something that felt really ubiquitous in the sense that if you were in data, you experience something like this. And that got me thinking like, why don't we have better ways to manage this problem? And do we even have the right language to describe this problem?

Barr Moses: And so in starting Monte Carlo, I actually spoke to hundreds of data leaders from very small startups to large organizations like Netflix and Uber, where data is core to their mission and learn a few things. But one, learned that as I mentioned, every company runs into this problem. And two, it's a really unsolved problem, the sort of question of like, how do we trust the data? How do we know that data is reliable? And so actually decided to start Monte Carlo with the mission to help organizations become data-driven by minimizing what I think is the biggest problem today, which is data downtime, which is the term that we coined for describing incidents when data is wrong or inaccurate or just can't be trusted.

How you would define a data-driven organization?

Adel Nehme: So I'm really excited to discuss with you all the cool things Monte Carlo is working on, but before we do that, you mentioned here the state of data-driven organizations. And one thing I'd like to discuss with you is really the state of data science and the march for organizations to become data-driven. Over the past decade, we've seen tons of investments in tools, infrastructure and hiring, but that doesn't mean organizations are making the most of their data or are necessarily data-driven. This is something I've seen you write and speak about. Can you walk us through how you would define a data-driven organization?

Barr Moses: That is such a great question, because I feel like we were throwing around the term data-driven for like about a decade, and I'm not sure there's always something concrete behind that. So when a company says, we really want to become data-driven, what does that mean? Sometimes it can be really surface level. I can be someone just like woke up one day and they're like, I want to become data-driven. And then hire 400 data scientists and invest in tool X. And that's it. Call it a day. We're data-driven, right? Success. But it's not that simple, right? It incorporates what I think really requires both a mindset shift and organizational and cultural shift that needs to be supported by technology, but you can't have one or the other. And it certainly is not a surface level initiative when done well.

Barr Moses: And I'd say today, there's probably two main use cases that we see companies using data for, this is really simplifying it, but very, very in a simplified way, one, it could be using data to drive digital products, can be in the product. Second, it could be to actually make decisions, whether that's based on machine learning models or other ways to actually determine what's right for our business or what sort of strategy are we taking or sort of data-driven decision-making.

Barr Moses: And I think we have a long way to go on both of those fronts. And so when people typically ask me like, how do we actually create a data-driven culture? What does that look like? I think that starts with obviously collecting data and storing data and making sure that you have data accessible for everyone in the organization, right? So even if you have a team of data scientists or data engineers, data analysts, that's where the journey begins. But truly becoming data-driven is when marketing and sales and customer success and product are all very strong customers of the data organization and work hand in hand with them to make these decisions and to power the business. And that's really when you see the competitive advantage of becoming data-driven.

Barr Moses: Very often, we still see companies and organizations say, okay, we don't really have the data here or we can't really trust the data, heck, whatever. Let's just resort to gut-based decision making. Or where I have a conversation with someone and they were like, yeah, we're not really data-driven and we make decisions based on what we think of or some of the instincts. And there's definitely room for instincts and for gut-based decision making in companies, and certainly, new categories or new markets, you can definitely have an argument for that, but I really don't think there's any excuse for a company today to not use data and to not at least start their journey in becoming data-driven.

Barr Moses: And there's definitely different sort of, you can plot people or plot companies on a maturity curve, like how early are you in the maturity of your business in terms of becoming data-driven? And we actually worked with organizations to help create this journey, whether it's like in the early days, where you're really just very reactive, trying to figure out what data to work with, all the way to having best-in-class, very scalable and automated ways to empower people to make decisions in real time with data. So I think we've definitely come a very long way in last five years. And today, when companies say, we want to become data-driven or we're on the path to become data-driven, there's a lot more behind that and there's a lot more that we know how to do, whether it's organizationally, culturally, or from a technology perspective.

Adel Nehme: That's spot on. And you mentioned here the organizational dimension, cultural and technological dimensions to becoming data-driven, what do you think are some of the main challenges affecting organizations who truly want to make the most of their data?

Barr Moses: Yeah, that's a tough question, because I think if you think about in terms of what does it stop us from becoming data-driven? In a way, the last year or so with COVID-19, that has really redefined and completely changed how we think about work and how we think about data in particular. And so in some ways, that has actually accelerated everything around the cloud and data, and also fast-tracked all of the challenges that we see.

Barr Moses: And so I think there's probably three main trends that we see that are contributing to these challenges. The first is there's a lot more data, especially if you think about like the rise of FinTech companies recently. Financial technology or financial services companies heavily rely on third party data. And there's just a lot of it. So it's not uncommon for a company to have thousands of data sources that they rely on. And that's the first trend.

Barr Moses: I would say the second trend is that there's a stronger more reliance on data. So actually, to your previous question, people today understand that data is important, that it's important to become a data-driven organization in order to create an advantage for yourself. Honestly, if you don't want to be left behind, you need to be thinking about how to become data-driven. And so as companies rely more on data to power their products to power their decision-making, that means that they're also, the stakes are higher for data. It's just like, you can't afford to make as many mistakes. When there's higher reliance, there's more eyes on data, people care more about it. It's a bigger deal, right?

Barr Moses: And then I would say the third trend is we're seeing this fragmentation of what I would call the modern data stack. Our data infrastructure and our data pipeline today are really complex. There's a lot of different systems that you can be using. People have choices, right? So you might have one team running on Snowflake and Looker and you have another team running on S3, Athena, Hive, and Tableau, a third team running on something totally different. There's really so many options that you could go with. And there isn't quite yet a standard stack. And so as we see these three trends really happening last few years, I think they're going to continue in the next decade, the next five to 10 years, I will say, and it will only exacerbate the challenges that we're seeing organizations who want to become data-driven.

Barr Moses: And from what I see, the number one problem that people have as soon as they want to become data driven is what I refer to as trust in data or data trust. The number one thing that people run into is like, okay, we have all this data, we have all these reports, we have all this machine learning models that we can use. Awesome, let's go, let's start writing this or use it. And then suddenly, someone is like, hey, wait a second. There's stale data here that's impacting my model, or wait, just a second. I'm looking at this table and you know what? The values here don't make sense to me, they're all negative. And they shouldn't be negative. That's really weird.

Barr Moses: And then the next question is we were like, well, I don't even know where this data come from or who's using it, or should I be using it? Should I be using some other table? And all of that leads to one big question of like, can I even trust this data? And I think that until we solve that, that is going to remain the single most important challenge in order to become truly data-driven.

Data Quality

Adel Nehme: Okay. That's awesome. And I want to talk to you here about potential remedies for these challenges. And one thing that is central to Monte Carlo's mission is the importance of data quality. And here, you're mentioning it as one of the challenges organization face, which is trust in data. So as data becomes even more critical for decision making and product development, data quality can have massive implications for an organization. Can you outline your thinking around data quality and what you think are key components of a successful data quality strategy?

Barr Moses: Definitely. So the problem of data quality is not new. It's been around for 40 something years. But I think the way that we've been thinking about data quality needs to adapt to the new way that we think about data and data infrastructure. So I'll explain what I mean. There's this saying of like garbage in, garbage out, which was really appropriate for, I would say, the traditional way of thinking about data quality, where if you think about the standard pipeline and the standard away, we have like one ingestion point. And then you just needed to make sure that the data that you're ingesting is of high quality. And then you know that the data that you were using was going to be of high quality. So there was one point at which you needed to make sure that the data is accurate. And I think that for the last couple of decades is really what data quality was centered around, profiling the data, making sure that whatever you're ingesting is of high integrity and then making sure that on the other end of that, you're using that data appropriately.

Barr Moses: However, the challenge with that in today's world is that the way that we manage data has become a lot more sophisticated. And so you might start by ingesting into the data, but then you have so many transformations, different layers of the data downstream. So you might have a data lake, maybe multiple data warehouses, ETL, ELT, BI, machine learning models, et cetera. And data can actually go wrong at any step of the process, not only just upon ingestion.

Barr Moses: On top of that, there are different people in different organizations in different steps of this process. Imagine this pipeline, all the way on the left side from ingestion to all the way on the right side to actually consumption of the data and reports or machine learning models, there are different peoples along that step, right? So in the past, there was only like one organization or maybe one, couple of people were really responsible for the data. Today, you have engineers upstream and then maybe you'll have a data product manager, and then you might have a data scientist or a data analyst or data engineer, and you might have an ML engineer. And you have all these different titles and people, and they can all contribute to the problem of data downtime.

Barr Moses: And so I think in order to really think about, a strong data quality strategy requires thinking through what are you trying to solve through? Are you trying to, like where is the source of the problem typically? And how are you addressing not only the question of where data is breaking, but also who owns that problem? And who's actually going to fix the data quality issue?

Getting Executive Buy-In

Adel Nehme: One thing we see data teams struggle with is really trying to gain executive or leadership buy-in around data quality initiatives. How would you go about determining return on investment over data quality initiatives so that data teams are better equipped to get this buy-in?

Barr Moses: That's a great question. I think in terms of getting buy-in and getting executive alignment or just alignment in general across your organization in order to get a data quality initiative, I think there's a few things, ROI certainly being one of them. But taking a step back for a second, the first thing is that you actually need buy-in that data quality is an important thing to anchor on.

Barr Moses: And actually, some of our customers will ask me like, is data quality something that I just need to invest in as a one-time initiative and then I'll forget about it for the next five years? Or is this something that I need to think about consistently? How much calorie input should I consider in this? And I think for a company that's really becoming data-driven and a company that's truly putting data at the forefront, data quality or data observability, whatever you call it, it has to be a consistent top line kind of thing that you're thinking about all the time. It has to be part and parcel in terms of the same way that you think about how to make data accessible, how to store the data, how to analyze it, you need to also make sure that you're thinking about how to trust it.

Barr Moses: And so when you think specifically about the impact of data quality and how we measure that, and that goes back to your question of ROI, there's two main metrics that we think about. One of them is time to detection, which really refers to how long it actually takes the team to identify an issue. So oftentimes, we actually work in speaking with companies, it could take them months to detect a data quality problem. And that's not uncommon, because maybe some table broke down somewhere and just nothing was alerting on that. So how are you supposed to know? There might be all these silent failures impacting your business and you're learning about them weeks or months later. And those silent errors can have like many millions of dollars in impact on your business. So you can't really afford to prolong the time for detection here. And so time to detection is a very important metric.

Barr Moses: The second key metric is time to resolution. And that pretty straightforward metrics, like how quickly are you able to resolve a data incident once you're alerted on that? And how many people are involved in that process? And so both of these metrics together give you a strong sense, like how strong operationally your company is in actually resolving, managing data quality. And I think having a very strong lens on these along with what we call data downtime, how are you improving on data downtime overall, will give you, again, a good sense of the ROI and the impact of data quality.

Role of Data Literacy

Adel Nehme: 100%. And zeroing in on the problem of alignment between the data team and executive team, how important do you view the role of data literacy when fostering alignment and when scaling data science in the organization?

Barr Moses: Yeah. I think data literacy is probably one of the most important things that companies can invest in as a really kind of like table stakes investment. And let's define for a second what is data literacy means. I don't think data literacy means everyone in the entire company needs to learn SQL or R or whatnot. That's not data literacy. I don't think that's necessarily the goal. I think what we need to define is how do we want each team or how do we want our company to engage with data? What kind of decisions or what kind of goals are we going to drive based on data? So it always has to go back to the business outcomes, right? So it starts with saying, what does your company want to achieve this year? What sort of like the big, hairy, audacious goal that we're going after? And based on that, what kind of data does our team need to work with in order to get there?

Barr Moses: I actually, it was one chief data officer that I spoke with that actually created this matrix, where you can see on one axis the different teams, the different functions like marketing, sales, customer success, product, R&D, et cetera. And then on another function is the different kind of skill sets and data and the different languages. And basically provided a score for every company, for every function, sorry, with goals. So the marketing function literacy is obviously very different than like product or engineering, but having some way of saying this is how much we think a team should be able to do with data and this is what we expect folks to be able to work with data, will basically create sort of a baseline across the company so that people across different functions could use data effectively.

Barr Moses: So I think there's a little bit, when we say data literacy, there's a little bit of a risk that we go overboard in one direction. I think it's really important that we agree it's very important, but we tie it back to what are we trying to achieve with data. It's not data literacy just for the sake of it, it's for the sake of empowering us to achieve better outcomes.

Adel Nehme: Yep. So another thing that is central to Monte Carlo's mission is the concept of data observability. So this is a relatively nascent term and is an emerging category in the data space. So can you define data observability and how it can help so many organization's data quality problems?

Barr Moses: Definitely. So I think the concept of observability is a very interesting one. And actually, it originally stemmed from the world of software engineering and DevOps. So if you think about observability in the concept of software engineering, it's kind of that has emerged in the last couple of decades as a fast growing area that really supports DevOps teams who manage application infrastructure downtime. And so these teams track metrics that help understand the health of their systems. So observability really speaks to the ability to determine the health of a system by observing its output.

Barr Moses: Now companies manage infrastructure and application downtime really diligently. You can't imagine a company's apps just going down or if their website is going down. Today, we map in what we call five nines, right? Every company needs to have five nines of application uptime, or strives to that. And that the sort of the tools of DevOps observability has helped software engineers better manage the health of their applications and their infrastructure. Today, you can't really imagine any engineering team operating without something like New Relic or AppDynamics or Datadog or Grafana, like some of these tools that help engineers make sure that their apps and infrastructure are up and running.

Barr Moses: Now, if you think about that corollary in the world of data, we actually are trying to do the same thing in terms of like running a really complex system and having like really great data that we can rely on, but we don't have the tools to manage them. It's a little bit crazy, we don't actually have the corollary of that in data. And yet, we are holding ourselves to this very high standard of creating very reliable, fast, speedy and efficient data systems, which is crazy. And so actually, what I strongly believe in is that we need to take the same concept of observability and apply that to data. So if you think about like, what does it mean to have really reliable, trusted data, we actually broke down data observability in just five core pillars that we believe that if you monitor for and really think holistically about these five pillars, then you can have a strong sense of the health of your data.

Barr Moses: And so these five pillars are actually, I'll walk through them quickly, but the first is freshness, which speaks to the timeliness of the data, is a data arriving on time? For particular table, let's say it gets updated like five times a day and it hasn't been updated at all today, what does that tell me? Is there an issue or not? The second is volume. So the volume of the data, if I expect 10 rows and so I'm getting a million rows, what has changed and why?

Barr Moses: The third is distribution. So this is a whole slew of metrics around the field level and the values of the data. So I gave previously an example, I say, I have a table that's fully populated, but it's all kind of like null values or negative values, very different from what I expect. There might be an issue. And then the fifth pillar is lineage. And lineage is sort of both the table level and at the field level really helps us get a view of the data and for a particular asset, all the upstream dependencies on it, upstream, which could give us clues to the root cause of a particular problem and all the downstream dependencies of table asset, which could give us an understanding of the impact of a particular problem.

Barr Moses: And so together, these five pillars can actually bring you the confidence and the visibility into the health of your data and makes it easy to understand, quantify the impact of data quality on a particular business. And so what we see best-in-class of data teams do is not only think about great infrastructure and great pipelines, but also think about great data and high quality data and how to actually manage the data in a way that I can actually trust it.

Adel Nehme: That's awesome. And how does Monte Carlo intend to solve these problems? And can you walk us through some example use cases that you've worked on?

Barr Moses: Yeah. So luckily, the solution or the way that I think we need to approach it is also something that we can borrow from software engineering and from the best practices of DevOps. So before I start talking about what I think the best solution is, I'll just speak to what people might be doing today. So there's a lot of, most data teams or many data teams have traditionally resorted to manual ways of making sure the data is trusted. And what do I mean by that? There might be a team of tens or hundreds of people who are literally just staring at dashboards and making sure that the numbers are accurate.

Barr Moses: Like I remember personally, when I was responsible for a particular report that our CEO and our board was relying on, me and my team, we would wake up every day and spot check like hundreds of data points in that dashboard to make sure that nothing has changed, the data is still accurate and we can still use it. And if by chance something was off, we'd basically spend the rest of the day trying to understand what happened and why. So very, very manual ways of making sure that the data is accurate.

Barr Moses: I think that would've worked fine five or 10 years ago when there was like, maybe just a small handful of people using data really not often, maybe once or twice a year. Today's world, we have thousands of people in a company that are relying on data in real time. It could be, maybe the entire company is actually using data in real time in some organizations. And so you can't possibly think that a manual process is going to be sufficient. We have to be crazy to do that.

Barr Moses: And so if you take the concept of observability and apply that here, I actually think we need to think about the corollary of that sort of like a New Relic or AppDynamics or Datadog, but for your data. And what does that mean? It's a solution, a data observability platform that can help with the instrumentation, the monitoring, the alerting and the collaboration and the resolution of these data issues that are encapsulating these five pillars that I talked about. I think a strong data observability solution needs to connect to your existing stack and to connect to it end to end. Meaning, it's not sufficient to just have data quality or data, not only on a particular part of your stack, like just a couple of tables in new data warehouse or just a particular data set. You actually to address this problem, it has to include wherever your data is, so including your data lake and your data warehouse and your BI or your machine learning models. And you have to have that end to end visibility from ingestion all the way to consumption.

Barr Moses: I also think that you need to think about how do you provide a rich context to each of these problems that the platform identifies? And so having strong root cause tools in place will help you actually determine quickly what the problem is and how to resolve that. And actually, probably the best way to approach this is to prevent these issues from happening in the first place. And so what we're finding is that by exposing some of these best practice observability in data, companies are actually able to reduce their data downtime by 90% just by implementing some of these best practices and bringing more awareness around what does our data look like? What is the health of it? And more metadata to help us determine the health of our data.

Rise of Metadata Management Tools

Adel Nehme: This is really exciting. And now one thing you mentioned, especially at the end here is metadata. Can you speak to us on the rise of metadata management tools and data lineage tools and how you see them impacting organizations?

Barr Moses: Yeah. So the whole emergence of the metadata space has been super fascinating. I think in the last couple of decades, we've become really good at collecting and tracking data. We've actually gotten to a point where we're like hoarding data. You're like, the more, the better. I'd like, bring it on. Just wherever it is, let's just collect it more. And often, what we see is that companies typically have more data than they can ever manage, let alone process or analyze or make sense of. And they often start drowning in lots of data and asking questions like, okay, what data do I really need? And then starting at like, there's like a gold, silver, what have you, and like tier one, tier two, tier three data. And there's all these different acronyms like ARR revenue one or ARR revenue two, like all these different naming conventions to like, make sure that you're using the right dataset. And it gets really, really complicated. And potentially, actually, it's like self-inflicted pain in some way.

Barr Moses: And I think what we're seeing in the metadata space is actually kind of like in the same way there's this rise of metadata where we're like, oh, there's a lot of data that we can collect about our data, how awesome that is. And now we're just starting to hoard that a little bit. And companies are like, let's collect lots of metadata. I'll just say something really controversial for a second. I think metadata by itself is completely useless. There is nothing that you can do with just metadata on itself that's actually practically helpful to your business. I think metadata is very incredibly valuable when put in the right context with the right business outcome in mind. So I'll take lineage, for example. Lineage is something that people get really excited about, oh my God. Show me the basket of lineage map of my business. They look at it for five minutes and they're like, okay, let's move on to the next shiny object. Like, I'm done.

Barr Moses: But where does the power of lineage come from? Lineage is valuable when it's used in a particular context and to solve a particular problem. So for example, if there was a particular table that has not been updated, maybe there's a specific freshness problem, but you know what? There's zero dependencies on that table downstream. So who cares about that table? Maybe it's not being used at all. So I shouldn't care, I shouldn't worry about that. I don't need to know about that. Maybe I just need to deprecate that table. On the other hand, if there's a particular table that's not getting updated and there's hundreds of thousands of dependencies on it, specifically, those dependencies are users that are making decisions about the business, or there's actually customers that are getting access to that data or maybe these are like mission critical machine learning model. In all of those instances, you want to know about that freshness problem as soon as possible, right?

Barr Moses: And so this is an example where a strong understanding of lineage can help you operate your infrastructure better, can help you operate your business better, not just lineage for the sake of lineage. And so I think we need to get a lot smarter about metadata and particularly apply it in the context of solving real customer problems. That's really where the magic happens.

Adel Nehme: Definitely agree with you here on business value alignment. So pivoting away slightly from talking about data quality to just data infrastructure in general. Another thing that's often talked about this year is really the rise of the data mesh and how it can potentially solve some of the bottlenecks of the current paradigm and data infrastructure. I would love your thoughts on what these bottlenecks are and what a data mesh is and how would it solve these problems?

Barr Moses: So I think the concept of data mesh is really, as you mentioned, become more and more of a hot topic recently. And I think it ties back to really everything that we just discussed. More and more organizations want to become data-driven, realize that it's important or to become competitive in markets, it's important in order to become successful in business, more and more data sources, more and more data consumers, fragmentation of the data stack, et cetera, all of these actually blend to this point in which companies are asking themselves, what is the best way to organize ourselves or to organize our organization in a way that helps us adopt data?

Barr Moses: And in the current model, the historical notion has been, the more data you have, the better. But actually, at the point that we're at today, some of the bottlenecks include data that's not being used at all, teams using different kinds of data. So maybe finance team will use a particular dataset, but the sales team will use a different dataset and the customer success team will use a different dataset. And so all three teams are relying on data, but they might come to totally different conclusions about what needs to happen in the business, because they've just been using different datasets or looking at different metrics or interpreting data in different way.

Barr Moses: And so as every company strives to become a data company, there's different use cases that emerge. And as a data engineering team, you're really trying to serve lots of different use cases and lots of different consumers in a way that can be really, really challenging. And so that, one of the most common questions that I get from customers is we are responsible for the trust of the data, we're responsible of our analysts or customers wanting to use data, but we can't actually fix the problems in data. Why? Because the teams that are upstream from us were producing the data, potentially, an engineering team. So a data team will be somewhere in the middle of the chain, where downstream from us, we might have people consuming the data that we're working with. And upstream from us might be an engineering team that's producing the data, for example. And so I might be on the receiving end of lots of questions around when was the data updated? When does it last refreshed? But I don't actually own the data to the point that I can fix any of those problems because it's a different team.

Barr Moses: And so in those cases, oftentimes starts this culture of finger-pointing and blaming and saying, wait, hold on. It's like this other team that's responsible for that part of the data pipeline. And you know what? I can't really solve that. And so there's lots of questions of ownership. And that leads to a situation where it's hard to become data-driven.

Barr Moses: And so then the question is who actually owns the data? What's the best way to structure this? Is there a centralized model? Should there be one team that owns everything with a center of excellence? Or should there be like distributed teams with embedded data people within different functions? And we see companies all across the spectrum all the way from centralized to completely decentralized.

Barr Moses: And so within the context of all this and the topic of data mesh, which again, actually borrows on concepts from software engineering and defined by Zhamak Dehghani from ThoughtWorks. And this basically brings the theory of domain-driven design to the concept of data infrastructure. And really what's the idea here is a proposal for how to solve the question of ownership and accessibility and to solve the problem of silos and basically by bringing in the best of both worlds, where you can have a universal domain-agnostic and automated approach to things that need to be standardized at the company level like data governance, data lineage, data monitoring, best practices. And on top of that, have domain-specific or kind of like autonomous teams that have ownership across their data pipelines. And in those cases, empowering those teams with self-serve discoverability, self-serve observability, and other tools in order to really truly adopt data.

Barr Moses: And so the concept here is how can we organize our people and our organization and our tools in a way that allows to become data-driven? I will say it's the early days of data mash, and so I think companies mostly ask themselves like, how do I actually adopt this? What are the first steps to do that? And I think it also, there's something about this concept and how simple it is that's very appealing in the sense of enabling organizations to fast-track their path to becoming data-driven.

Adel Nehme: And how do you see data mesh adoption growing in the future?

Barr Moses: Well, I actually, in terms of sort of the, if you think about the last couple of years and how quickly it has been adopted, I'm really curious and excited to see the next couple of years, and how it's going to accelerate. I think, there's actually a Data Mesh Learning group that has emerged on Slack, where you can catch grade conversations by leaders like Zhamak and others, and actually ask them directly questions about data mesh or others. And more and more data teams are adopting that. So the platform team at Intuit actually recently wrote a series of articles about their experiences on their path towards data mesh. So there's definitely more and more, both organizations adopting data mash and also writing about that, which is very helpful for people wanting to learn about that and to figure out how to adopt that or use that model for their own organizations.

Call to Action

Adel Nehme: That's very exciting, and we'll definitely make sure to include some of these resources in the show notes as well. Finally, Barr, it was great to have you on the show. Do you have any call to action before we wrap up?

Barr Moses: Absolutely. Well, first of all, I always am happy to talk about data science and data analytics best practices, and always looking to connect with folks. So feel free to reach out, recommend checking out our blog montecarlodata.com/blog, and sign up for a newsletter if you'd like to keep up with all things data observability. In general, just really excited about where the data industry is heading. And so really looking forward to hearing what's top of mind and exploring, exchanging ideas. So feel free to reach out.

Adel Nehme: We'll definitely make sure to link to all of these resources in the show notes. Now with that in mind, thank you so much, Barr, for coming on today's episode of DataFramed.

Barr Moses: Absolutely. It was fun. Thanks for having me.

Adel Nehme: That's it for today's episode of DataFramed. Thanks for being with us. Really enjoyed Barr's insights on the data quality challenges organizations face and how Monte Carlo solves them. If you enjoyed this episode, make sure to leave a review on iTunes. Our next episode will be with Elad Cohen, VP of Data Science at Riskified, on how data science is being used to fight fraud in e-commerce. I hope it will be useful for you. And we hope to catch you next time on DataFramed.

Topics
Related

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

How to Learn Machine Learning in 2024

Discover how to learn machine learning in 2024, including the key skills and technologies you’ll need to master, as well as resources to help you get started.
Adel Nehme's photo

Adel Nehme

15 min

How to Learn Deep Learning in 2024: A Complete Guide

Discover how to learn deep learning with DataCamp’s 2024 guide. Explore topics from basics to neural networks, with key applications and learning resources.
Adel Nehme's photo

Adel Nehme

14 min

A Beginner's Guide to CI/CD for Machine Learning

Learn how to automate model training, evaluation, versioning, and deployment using GitHub Actions with the easiest MLOps guide available online.
Abid Ali Awan's photo

Abid Ali Awan

15 min

OpenCV Tutorial: Unlock the Power of Visual Data Processing

This article provides a comprehensive guide on utilizing the OpenCV library for image and video processing within a Python environment. We dive into the wide range of image processing functionalities OpenCV offers, from basic techniques to more advanced applications.
Richmond Alake's photo

Richmond Alake

13 min

An Introduction to the Mamba LLM Architecture: A New Paradigm in Machine Learning

Discover the power of Mamba LLM, a transformative architecture from leading universities, redefining sequence processing in AI.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

See MoreSee More