The New Toolkit For CDOs with Adrian Estala, VP, Field Chief Data Officer at Starburst

Richie and Adrian explore the modern data stack, agility in data, collaboration between business and data teams, data products and differing ways of building them, data discovery and metadata, data quality, career skills for data practitioners and much more.

Oct 11, 2024

Guest

Adrian Estala

Adrian Estala is VP, Chief Data Officer at Starburst Data, specializing in AI, Data Products, and Enterprise Data Strategy. He is recognized for leading agile, agent-driven pathfinders that deliver rapid, measurable innovation. With over 20 years of experience—including serving as CDO for a Fortune 100 company—Adrian has successfully led transformational programs that elevate business performance and operational excellence. Trusted by executive teams, he excels at aligning strategy with execution to drive lasting enterprise value. A seasoned global leader, Adrian brings dynamic energy and a contagious passion for continuous learning and innovation.

Host

Richie Cotton

Key Quotes

It's about bringing people together to solve data problems. I've got a business team that sit in the room with my engineers or sit in the room, my data scientists, and we're having really smart conversations about how they use data because it's given to them or offered to them in a way that they understand it. How do you get started? How do you build it? I think from this point going forward, you build it with the business and you build it together.

When you bring different technical and non-technical stakeholders together, you're driving innovation, you're driving ideation. You're able to get to a very fast prototype in a couple of hours. Now you've got what you want. Now you can formalize it, describe it, all that really rich metadata and publish it for other people to use. That's when I talk about trying to bring the two sides together, an agile, a sustainable data stack. That's what I'm talking about when we're abstracting all the complexity in the backend.

Key Takeaways

Instead of focusing on the ever-changing “modern” data stack, prioritize creating a system that is agile, efficient, and capable of evolving with business needs while maintaining performance and compliance.

Engage business teams early in the development process to ensure data products meet their needs, fostering closer collaboration between IT and business stakeholders and aligning on outcomes from the start.

Ensure all data products are tagged with business-friendly metadata to improve discovery and usability for non-technical users, facilitating easier access to relevant datasets.

Links From The Show

Starburst

Career Track: Data Engineer in Python

Transcript

Richie Cotton: So to begin with, I'd like to talk a little bit about the modern data stack because it's been around for a while. So can you just talk me through what is this and is it still as modern as it claims to be?

Adrian Estrala: You know, I don't know that I like the word, or the phrase, I should say, modern data stack. The word modern, to me, I think it it doesn't fit, right? Maybe it's clever marketing. And to be fair, we've all used it.

We're all guilty at one point or another. I think the challenge right now, if I could rephrase it, I think we want an agile data stack. We want a sustainable data stack. Modern gives me the impression that there's always something better coming in. So I go from What's modern yesterday to what's modern today to be modern tomorrow.

What I really need is sustainability. What I really need is agility. What I really need is compliance. What I really need is performance. What I really think is efficiency. And so How would I define all that today? I think that's the real question. And how do I make sure that I can maintain whatever that design looks like over the near future?

I have a lot of opinions for what that looks like. I think those opinions are largely based on where you are, right? So your modern data stack, I think depends on what your today looks like, what can you get to? What do you need to get to? And for some companies, if we're all on prem and we're trying to get to a modern data stack, maybe that means just trying to take the best advan... See more

tage of our on prem capabilities in a way that allows our business teams to succeed.

For teams that are trying to get to the cloud, that's a different journey. And for teams that are already in the cloud, that's a different journey. But I think for all of us modern is defined differently. And I think for all of us we absolutely would embrace the concept that sustainable and agile need to kind of fit in with that same word modern.

So anyway, long opinion, but I think that's you started right off the bat with a key spot for me.

Richie Cotton: So, your data has got to be agile, sustainable, compliant, performant, and efficient. That's a lot of requirements. Maybe we'll start with agile then. So what would it mean for your data stack to be agile?

Adrian Estrala: Yeah, I think there's different layers there. Let's start with the business layer. I think on the business side, I don't know that I would use the word data stack, but I think about an Agile data team or an Agile data function, I think for me, it's the ability to say, I need data to solve this problem for me today, and then to come back tomorrow and say, I need a different data set to solve the same problem, or I need a different data set to solve a new problem, and then to come back the next day and say, I'm not even sure what data I need, but go find it for me, so to speak.

And the agility for me means that I can Always bring new data sets to bear on new business problems. That's the agility, because the market's shifting. think when I'm thinking about an engineering perspective or an architecture perspective, I think the agility there for me is being able to respond. Is saying, hey, spent a lot of time moving all this data into this one space.

Maybe that's a proprietary environment. And I've spent a lot of time building these really cool models that I think work for these business teams today or these applications today. Agility for me means that I can take that and leverage it for a different solution. Or agility for me means that I can build something new for somebody different.

Agility for me means that I can respond to what the business is asking for. in a very efficient way that I don't have to go rebuild from scratch every single time that maybe I've got this foundation of building blocks. talk about data products later, but I think maybe we could talk about core data products and derivative data products, but some way for me to maintain some core, some foundation and then just be able to rebuild very quickly on top of it.

That for me is agility. I'll give you one more. If I'm on the way, way, way back into that, right? I'm on the infrastructure side. And I'm having to deal with hardware on prem or in the cloud agility for me means that I can make smart decisions for how I manage the infrastructure that I am not tied to.

Every time I want to move something, I've got to go three layers up and say, Hey, are you sure it's not going to bother your power bi app? want some level of abstraction, right? Agility for me means that I can make the right decisions for my infrastructure without impacting every layer in front of me,

I can be smart about what I do and when I do it.

Richie Cotton: So it sounds like from the business point of view, a lot of it's about, well, I don't know what data questions I want to ask tomorrow. So, the data team needs to be able to deal with that uncertainty and still have this sort of time to bring some business value. I'm just wondering how you go about implementing that.

Where do you start?

Adrian Estrala: I think it starts with the business. So maybe let's think about something completely different than where they were used to dealing with this. I think traditional way would be, let's go see if we can build this architecture, this data stack.

They can respond to everything and we'll do it in isolation and then we'll go talk to the business team and see if we got it right. I'm not sure that works. what I would argue is we would sit down with the business now, we would get the business teams engaged today, and we would actually partner with the business teams in co developing that.

And I think what comes out of that conversation isn't this big architecture that IT manages, that the business is a customer of what we look at today. I think what we get to is an environment where you've got the business pulled much further into the I. T. realm. And not that they're all coding,

not that they're all necessarily becoming engineers, but they feel much more empowered, right? All of a sudden, I've got a business team that's sitting in a room with my engineers or sitting in a room with my data scientists, and we're having really smart conversations about how they use data. Because it's given to them or offered to them in a way that they understand it.

And so to your question, how do we get started? How do we build it? I think from this point going forward, I think we start and we build it with the business and we build it together. And that maybe that demarcation line of this is where it ends and the business starts. I think maybe that shifts left a little bit again, not because we want to make them coders, but because we're starting to offer them data sets, data assets, data insight in a way that they're really, really starting to understand, not just what it is, but actually how to use it and how to manipulate it.

Richie Cotton: I'm certainly a big fan of having business teams talk to technical teams. I'm curious as to how this works from an organizational point of view. And so who's in charge of building this? Would it still be , I guess your chief data officer or your chief information officer who's in charge of building out all these agile tools or would some of the management responsibilities shift to business?

Adrian Estrala: As a former CDO, I'd like to say CDO is in charge of it. would tell me that I think, going back to kind of the business focus, I think you've got business teams. they have to be in charge or accountable at some level. And so I think in many senses, they're leading in terms of demand. And I think IT is following in terms of supply, if I want to make it that simple. explain that, right? , let's talk about some very specific examples that somebody could wrap their arms around and say, okay, I think I understand what Adrian's talking about.

I hope anyway. If not, I've messed up your podcast. So let's talk about data products. I think it's a fantastic way for us to think about how we can achieve what I described a second ago. And it's how I do it in workshops every week. I think when I sit down with a business team and we start to describe, let's pick something hard.

Let's pick a use case, Richie. Let's pick an AI use case, right? Somebody wants to do gen AI. There's a business somewhere that says, I really want to use your AI. They may not really be sure why, and they might just be responding to what they hear in the market in terms of buzz. Damn it, we need to have it, because everybody seems to have it but us.

And so I love bringing these business teams into a room. And a whiteboard type of person. So give me a whiteboard, give me a couple of colors, and we can start to draw. And we start to kind of map out with this business team in the room. Tell me how you use data. Where do you get data from? And sometimes they know, sometimes they don't know.

Tell me what you do with the data. Tell me how you use the data. I pop it in Excel. I pop it in Power BI. I pop it in Tableau. We put it in this application. Maybe we put it in this algorithm that creates a prediction for us. There's a lot of different ways they're currently using data, and they talk about the different teams.

that currently own those different solutions. And that team takes the data and they manipulate it over here in this lake and they pop it over here. And that team takes the data and it moves through these hops, they pop it in Power BI, they manipulate in Power BI and it ends up over here. That team, and so everybody is taking data.

Hopefully from the same sources, in some cases different places, they're manipulating that data in a different way, and then they're trying to create solutions that are ultimately driving a lot of the same executive insight, right? So you're an executive looking at reports, Power BI, Tableau, Excel, whatever, and maybe they're telling you different things because there is not a level of consistency in terms of the data you're using.

It might be the right data, It just wasn't put together in the right way. And so back to that, you're in the room with me now, Richie, and we're drawing this up and the business teams are highlighting these issues and they're saying, here's the problem, right? And so here's what they do. Here's what we do.

We're duplicating effort. We're copying data. There might be a data governance person in the room driving me crazy because I cannot chase all the lineage. I cannot chase all the different ways people use data. And then you start to simplify it for me. So hold on a second. So what if we. Introduce this concept of a data product.

And I tell these business teams, let me give you an example. I said, imagine you walk into a pharmacy, you're looking for medication, you find a headache pill kind of box on the shelf and you pick it up and you look at it and right off the bat, you can tell on the box is like, here's what it is. You know, you know what's inside of it.

You flip the box. Here's how you use it. So now I know what's inside of it. Now I know how to take it. You flip the box and here's you call if you have any questions. So very quickly, if you're at a pharmacy, you can pick different medications off the shelf and very easily understand what's inside the box and how to use it until you find what you want.

And once you find what you want, you pay for it, go home and you take it, right? You get authorized, so to speak. I want our business teams to kind of think about data, shopping for data in the same way. I've got a headache. I have an insight that I'm looking for. I should be able to shop for data in the same way.

And so when you think about a data product and you're shopping for data inside, you should be able to pick data products off the shelf, read the box, understand exactly what you need, make sure that it is, in fact, going to solve your problem, and then put it back on the shelf or put it in your cart. If we can imagine, or if we can help our business teams imagine that experience, now they're excited.

And then we say, let's build you some of these data products. And so we start in a whiteboard designing data products. What do you want in this one? Well, can you get customer data here? And I want this product data. And I want to have this information for this specific region. And over here for this data product, I really want operational metrics.

I want to understand which customers are buying, which customers aren't buying over here in this data product. I want to understand maybe some regulatory. So yeah. We can build different data products for different types of analytics. We can put them on the shelf. We can describe and give them owners and then other people can use them.

And so now you've got a team Power BI over Tableau down the hall, application over here, and they're all using the same data product, right? So instead of having to do that, the manipulation on their own, that data products already been curated. You can imagine nice schema. That's exactly what they want in the format they needed.

And they're off to the races. And if somebody else comes in and says, Hey, I've been looking through the shells. I haven't found the headache medication I'm looking for. they're enabled to build their own, right? They can do, at least the way we do it in Starburst, very easy to sit in a room, write a query to a specific system, ideate and say, what if I put it together this way?

That ain't gonna work. Let's change this. Let me write this query. Let me pull it over here. That isn't gonna work. Let me change this. And so now you're driving innovation. You're driving ideation. You're able to get to a very fast prototype in a couple of hours. , now you've got what you want. Now you can formalize it, describe it, add all that really rich metadata and publish it for other people to use.

That's, when I talk about trying to bring the two sides together, an agile, a sustainable data stack, that's what I'm talking about, right? Where we're abstracting all the complexity in the back end. I don't care what cloud it's in. I don't care what environment it's in. I love it if it's in a lake. But if it's somewhere where we can't get into a lake, I'll be able to reach it anyway.

Let's leverage a data where it sits. Let's erase the complexity, so to speak, or at least abstract it. And let's make it really easy for the teams in the front end to use it. I said, I don't want to make the business teams coders. But if they can look through these data arts or these metastores or these, you know, whatever word you want to use, marketplaces, these libraries of data products, and they can start to pick and choose what they want.

Now they're solving business problems, and the data part is easy, And today, Data part is harder in some cases than the business problem, and we have to kind of find a way to work around that. So, sorry, Richard, for a long answer, but that's the way I would describe it.

Richie Cotton: A lot covered there. I have to say when you start talking about, Oh, there's like different teams that getting different answers from slightly different versions of data sets. I was like, that's a big can of worms you've opened. But yeah, I, I like this idea of using data products to solve that.

So, can you go into a bit more detail about what a data product looks like. So you mentioned the algae of, it's going to be like a box of pills for your headache. , what's the, data version of this? Is it just like a single data set or is it something more complicated than that?

Adrian Estrala: Yeah. it could be more complicated than that. I think first, obviously we, there's an approach that we take in Starburst with data products that I, I, we, one of the reasons I'm here is because I love the way that we do it. But that being said as a CDO, as a person understands the importance of an enterprise architecture.

What I would tell you is that big enterprises are going to have data products from different places. And that's okay. What you want to do, first and foremost, across the enterprise, one of the things that we do with our customers is help them kind of draw out a fundamental design for data products.

You need a data product strategy. Here's what I want my data products to look like. You don't want your consumers to get confused. should not care where the data product came from. They should just be able to use them, so to speak. So first and foremost, there should be a strategy that defines what the data product means to the organization.

When you start to get more technical about data product, now it's where it matters a little bit more how you build them, right? Again, we want to abstract the complexity from the end user. They shouldn't care where it came from. They should just use it. On the back end, there's a couple different ways to build them, right?

Traditionally, back in 2019, 2018, many years ago, it was a Yeah, I think it was back there when I started. We're building data products in Azure and really building data products in warehouses. So we're talking about physical data products. Maybe they were fact tables. However, whatever, format you wanted to use or whether, design you wanted to use.

But we're building physical models that we were called data products. And then we would, I had a SharePoint site where people would go in there. And they would have a description for every one of those data products, and they would pick what they wanted. They would ask for permission. We had a ServiceNow ticket.

They would get permission. They would use it, right? That was years ago. I think a lot of teams still do that today. You build data products and you think about that in terms of a physical model. What we do is a little bit different, what we do is our data products is really just based off a query or code, right?

So I have the ability, I would say I have the agility and the advantage of being able to say, look, I'm My first step isn't migrate the data, right? Oftentimes, if I'm building data, a physical data model for a data product somewhere, my first step is we'll go find the data, migrate the data. Then I'll get a data engineer to come in here and model it for you.

Then they'll test it. Then they'll show it to you. And then nine times out of 10, it's wrong. It's not anybody's fault. It's just the business team isn't always sure what they need. And you get stuck in that iteration. And then , every time you want to change, it's the same thing. And every time somebody else wants something a little bit different, it's the same thing.

That's why it takes so long to build a data pipeline. The way that we do it, and sometimes that's okay, sometimes you need to do it that way. The way that we do is really just based off of code, right? I want to query a couple of data sets, a table here, a table there, assets somewhere else. Maybe I get a row from that system, a column from that system.

I bring that together in the way that it needs to look, right? And so I like to build data product templates where for certain types of data products, here's, I want my columns to look this way. I want my metadata to look this way, and I can build these repeatable schemas for different teams. The data is different, but the scheme is always the same, and it's all code.

Right. It's all a query. And so if I have a data product that generates a data asset that a lot of people are using, I can materialize it. I can cache it, so to speak. Now, everybody wants it. They're not paying for it to get rebuilt every single time. They're just taking it off a cached view. If I have a data product that somebody says I want to refresh every time I execute it, then it's a live query, so to speak.

And if I have somebody that says, I really like what you built there, but I want to add something to it. I want to add a couple more columns. I want to make it a little more complex. I want to filter something out. They can just copy that query and build their own data product. They haven't duplicated data.

They haven't changed the original model. They haven't created new dependencies. I've given him the ability to be really innovative very, very quickly to build a solution that works for them. And so there's different ways to build data products. The approach that I just described, I think it is agile. I think it is sustainable.

I think it is cost effective, but it's one of the approaches you should take if you're building a broader data product strategy.

Richie Cotton: Okay. So it seems like we've gone back to the idea of agile then, because you've got to have this iteration between of communication between the technical teams, your data engineers, and then your business team to decide exactly what you want and what data you need from that. Okay. So. I guess there's subtlety here then.

So, you said that you need a query to come out of this, that people can just edit, so, Who's going to edit this, SQL query? Is that going to be, again, is that going to be the business person? And do they need SQL skills or, and do they know what data is there to edit? , how would you provide all that sort of supporting infrastructure to make sure that the business person can then go and change the data product as they wish?

Adrian Estrala: Yeah, that's a great question. It's a great question because if you would have asked me this two years ago, I was very ambitious with the approach I was taking from a data literacy perspective. ambitious in the sense that I think I once there's, you might find a recording of me two or three years ago where I'm arguing about teaching business teams to write their own queries.

And thinking it's so easy, right? It's ANSI SQL. Anybody can learn it. It's pretty basic. Having now gone through more than two years of business facing workshops and having to kind of roll this out for a lot of companies, I think I think I've changed my opinion. I think here's the reality. I think when I say self service, I think , that means different things to different people.

There is a team out there where self service for them is all about just finding and reusing data products. Oh my God, I feel so empowered. I went to the marketplace. I found the data product I wanted. I pulled it into my dashboard and I was able to use it. And then I needed something else. I went back and that reuse for them is self service and they love it and they're high fiving.

There's another team that says, Yeah, that works. I really, my space is so unique and my space changes so much. I really need to, constantly in the search of new data and I'm constantly in the search of new ways to use that data. Self service for them does mean that they can sit down and write a query, right?

And they learn or they bring an IT person into their team. You've got a hybrid IT business team and you've got somebody who's just writing the queries. And that's empowering as well. Here's what's really interesting, Richie, is that there's something in the middle now that I think is starting to get a little bit easier.

We're starting to use more and more, we're starting to use kind of Gen AI to do that text to SQL conversion. And in some cases, and I'll explain what I mean by in some cases in a second, but in some cases that works really well. And now you've given even the business teams that never want to query, that are scared by a query, now they don't have to write the query.

Now they're just asking English questions. And they're getting answers back. In fact, they never even see the query. They just get the response. The way that we've been able to achieve that is, again, and I if you look at my LinkedIn, it says a data product CDO. I've picked this topic to focus on, but everything I talk about is probably data products.

But here's what's interesting about putting a data product in front of a Genii engine. Oftentimes when we're trying to use something as smart as LLM to answer a question and we want to load it with data and then we wanted to use its intelligence, so to speak, its pre trained knowledge with the new context, new data we've given it to answer a question, if we can be finite about the data that we're feeding it, we're giving them a better chance to respond.

Oftentimes what we do is we say, hey, ChatGPT use RAG to go find the answers, and it doesn't work because that RAG system is going out there and looking at existing metadata, trying to figure out how to answer your question. If you're asking a question in business language, you've got metadata in IT language, that RAG system's never going to be able to find the data you want.

It's going to struggle because it's just guessing. I think he wants this, he might want that, you know. Let me come up with an answer that makes sense. And, and there you go, and you don't get a really good answer. Push that aside and say, hold on a second. What if I take a data product and I'm clear about what I put in there.

I've got a really clean schema. I've got really good business metadata. I'm not describing the columns, if you will, in I. T. terms. I'm describing the asset in business terms. The same language I'm going to get the question from is the same language I'm putting the metadata into the data product from. So now I've got a business data.

If you're enriched data product, metadata enriched data product, and then I've got a Genii engine in front of it, I don't need RAG anymore. I'm just loading the schema and the metadata into the prompt. And then I can have my, consumer ask a question. The results are fascinating. And so we've been doing this a lot with a lot of our customers that are just trying to get some simple Genii solutions out the door.

This works. And on top of that, if you've built that data product with the business in the room the way that I described it to you earlier, They actually trust it. They understand where the data came from. Oftentimes, you put AI in front of a new business team, and they don't trust it. Like, whoa, whoa, whoa, whoa.

Can you tell me where the data came from? I can't, because we've all this data. I can't really pinpoint exactly where that answer came from. But if you put a curated data product and you ask a question, you can tell them exactly where that data came from. Because that's all you've ever, the only thing you've exposed the engine to is just that small data set.

And so again, you create more trust you create excitement, you create this feeling of, you know, a business person walks out of their room, they're calling their friends saying, Hey, you know what I did for lunch? I built the GenEye solution. Right? And so you get that excitement. It's not what they're going to need longer term, but it's a great way to get teams started.

It's a really great way to kind of get them to trust and start to really use that tool as a way to interrogate old data. We had a team that interrogated 60 years, 6 0 of Wells data. And they came out of that four hour exercise saying I learned more about 60 years of Wells data in this four hour workshop than I had in their careers, so to speak.

Because it takes so long to analyze such large dataset. So anyway, Richie, again, I'll you, you're, you're asking some great questions. You're getting me some long speeches, so I'm gonna be much more succinct from here on out

Richie Cotton: You're going for the villain monologue. So, you mentioned metadata and I certainly see how, you know, from a privacy point of view, you don't want to send like the entire schema for your whole database into a gen AI query, but it's maybe okay just to send the name of a table and a few columns that's part of a data mart in there.

So actually on that note, Is there any other metadata that would need to be included as part of a data product? Is it just literally the names of the tables or columns that are in your data set, or do you need more information as part of a data product?

Adrian Estrala: for the Geni solution.

Richie Cotton: Well, either. So what goes in your GNI queries and what is more generally in

Adrian Estrala: for the Geni tool, for the most part. We, I find that just kind of the descriptions of the columns, so to speak, that basic metadata is what's most helpful. Sometimes we'll get a bad answer. We'll go back and look at, you know, when somebody says, show me total revenue for 2023. And we'll get a number and we're like, that ain't right.

And then the customer will like, go back and look at the, the business customer is now teaching me. They're like, hold on a second, go back and you pull that data set again up. We pull up a table and they're like, see, look at that. That's why. Because that doesn't say total revenue. That says something else.

And so we'll change the metadata there. We'll rerun it, ask the same question, and boom, you get the right answer. So that's a simple example. But. I think oftentimes for that Jenny Ass solution, it's that simple. Love your question, though. When you think about the big data product as a whole, standalone data product, here's the way that I like build them.

And this would be true for any data product. So we build them in Starburst, but if you're building data products in any tool, it should be the same, right? You want to have a description, but just a broad description that's not too long. Again, it's like the front of, I'll show you my vitamin pills, right?

Just something that anybody can pick up and read very quickly in language of the audience, not technical language, language in the words and terminology, the person reading, it's going to understand. Number one, I have to have that. Number two, you know, what's super helpful is when you can actually show them who else is using it.

I love that mandated piece when I can see who else is using this. have a use case, similar. to that finance team or that marketing team or some somebody else that I know and trust. I'm going to probably use the same data set because I would say if they're using it for that, I should use it for my use case as well.

Another thing that's really useful is when you have some type of self, kind of, it's kind of a boat. We, in ours, it's like stars, right? But you can, some, a way for somebody to go in there, and actually give their opinion, right? I use a data product. One star, right? The data is not right. The customer field is wrong.

The data is two weeks old, right? Five stars. Yes, this data is always fresh. It's the right data. It solved my problems. So giving teams the ability to provide some feedback in a very simple way. to the data product owner. So whoever owns it needs to say, I own a one star data product., I better get ready to fix it because I want people to use it.

I want people to trust it. And so, some feedback is really important. The standard metadata around the data asset itself, as we described a second ago, is really important. But maybe another thing that's super useful as well, that kind of starts to get us into some of the compliance and regulatory spaces.

If we can go in there and tag data products, right? So we like to build tags. Maybe there's high risk data tag. Maybe there's a finance data tag. Maybe there's an external data tag. You can create specific tags for different types of data products. Super helpful when people are searching for data. And they're like, I'm looking for, I don't know, I'm looking for finance data for a specific region, And those tags make it really simple. Now, when you search for data products, sometimes you're searching the data assets themselves, and that's helpful, but you get a lot of stuff back. If you just search the tag, sometimes you get a better response. So I'll pause there, but concept of tagging data products is really helpful.

A concept of a simple explanation that anybody can understand is super useful. The idea of who else is using it, the idea of internal feedback, whether they like it or don't like it. And then obviously that on top of the standard metadata, I think that's recipe. for success when you roll out data products?

Richie Cotton: I like the idea of tags being useful for helping find data because finding data sets is often half the challenge of analyzing things. You know, you spend longer trying to work out , which thing you want before you can actually get, going doing the analysis. So, is the process of data discovery for data products, is that the same as just for like the traditional way of searching for data sets, which is, go and search your database and cry a lot.

I presume there's an easier way of doing it.

Adrian Estrala: Yeah, like the discovery question. The process for data products should be easier, I hope. If we've done it right, and we've done all the things we've described so far, tags and metadata correctly, then yeah, it should be a lot easier. But there's always, we're always going to have that problem of new data.

should, right? The idea of new data coming in, that's going to be around for a long time. That's why we want to be agile and sustainable. And so I like to create these discovery labs. Some of the companies, some of the banks we work with, they call them innovation labs. But it's this idea of bringing a small team into a room, a business team and we allow them to, again, in a secure environment, We allow them to ideate and we say, Well, hold on.

So what are we looking for? Well, recently something happened with a bank where they had 66 zero data sources and we're going in there and we're looking across the different data sources that really weren't sure what is in there or whether they're they could use it or whether they should use it, or whether they should decommission these data sources.

There's a lot of questions, right? So we're just doing some basic discovery. And it was very easy for us to go in there, leveraging some of the existing catalogs where they existed, but helping walk that team through just analyzing the different data and say, okay, I remember that that's from this and we don't need that.

Oh, that's, you know, we can use that. And so very quickly, They were starting to create data products from these different data sources, right? In other words, you know what? Let's put this in a data product. You could almost say, let's, package that and put it on the shelf because I know someone's going to need it.

And let's take that over there and let's package that. But so they were just trying to find things that they thought would be helpful for other groups. And then this is our code based data product. So I'm not saying, hey, let's migrate this data somewhere. I'm saying let's tag it, let's query it, and let's put the code on a shelf.

And if somebody else wants it, they can just open that box up, i. e. open up data product, and they've got the data. Here's what's fascinating, Richie, for discovery, in this case, in any way, When you get to a point now where now we've gone through the data sets enough and we've built a core set of data products that are easy to understand, that are consumer facing, on the back end, I still have this legacy architecture that I might want to get rid of.

And you know what? Now I can, right? So now if I say, you know what, I'm going to take these 60 data sources, I don't know what they are, and I'm going to combine them into a lake environment, or I'm going to get rid of 20 and I'm going to keep 40, whatever you want to do, You can do that without affecting those data products you've built for the business, because all you need to do on the back end is you're just updating the query.

And so now, originally, I was taking customer data from this system and product data from that system. Now I've merged that together in one low cost lake. All I do is update the query. And whoever is using that data product, They'll never even know it changed, right? They don't care that the source changed anymore because of that abstraction layer.

The data product they're using is a lot easier. And so to your question, I do think discovery is much easier. But I think that what's what really shifts here is two things. One is only one person needs to discover it. And once they discover it, everybody else can reuse it. Today, we spend way too much time with too many people inventing and discovering the same things over and over.

The second thing is, when we discover it, we make it accessible virtually. That's the right way to describe it. So that we kind of separate the physical connection. We now point them to a digital connection, which kind of makes, allows us, I would say, to move the data in the back end from an infrastructure perspective where we need to move it.

Richie Cotton: I like that idea of just having that layer of abstraction then. So, if you have to change something inside, like how the query works, it's not going to affect , the end version of the data product. So the users are happy. Although I think there's maybe a challenge that this introduces.

So sometimes I have the case where Someone's changed like a backend query and then you're like, well, okay, the results are different and then they're fine for the team who asked for a change, but then different team going, suddenly all my results are wrong. They're not what I expected. Is the company about to implode?

And so you have to I guess that means you've then got a communication issue between teams. So like, how do you deal with different versions of the data when you're changing things? How does maintenance work?

Adrian Estrala: a challenge. And it's still a challenge, I would say. don't know that we've solved that problem with technology yet. And as I say that, there's probably, you know, a listener out there saying, Oh, there's a tool here that can do that. Why isn't he saying data contracts? think there's a lot of great ideas, but a lot of really good concepts out there.

Some data lineage tools that we can use to track changes on one end and how, assess how they impact. a solution on the front end. I just don't think we've done that across enterprise yet. I think we're still struggling with that. When we build data products, when I sit down with a data product owner and we connect them to data source owners, people who own the data at the source versus the person that owns the data product closer to the consumer, we try to create some of those contracts.

And I think today, as much as I would love those to be automated, where somebody makes a change in the back end and automatically it fixes all the way to the front end, I think today we're still very much leaning on some manual agreements and manual paper contracts, I like to say where you've got a process in place to make sure that if someone's changing data on the back end, that there's an understanding that they've communicated that down the line.

I think that's easier with the way that I described in the way it used to be. I remember as a CDO, the way it used to be, I had so many different teams. using APIs or building different types of, , we used to use the word middleware or data pipelines back to back end systems that whenever I made a change or there was, I'm a service now and love service now, but back then we were making a lot of service now changes every time we made a change to the way the API worked or the way data was structured in the back end environment.

A lot of stuff broke, you know, and on a Monday morning, when you're running reports and stuff starts to break, you're in trouble. It happens all the time. I think if we fast forward that now to today and say, hold on a second, so what changes today? I say, well, remember we used to have all those pipelines kind of getting the same data and delivered to the same place.

Now, I've just got one, right? It's one data product. And that data product is just a query. And so if you can just communicate that change. to this team, they can make sure that their query gets updated. And if their query gets updated, other dependent queries will get updated. Now you haven't killed everybody, so to speak.

Now, all of a sudden, you've got a fighting chance at making smooth change because you've simplified. It used to be a rat's nest, and now there are much simpler pipelines. That's what we're trying to get to. Today, even when we're building data products, It's still going to take some time before the data product completely, , supplants or replaces all the old legacy pipelines.

We're not there yet. But I think as we get closer and closer to just using data products, reusable data products, I think that change becomes manage. At the same time, Richie, and you might educate me, I'd love to get your thoughts on this. I do think there are some tools that are getting better out there.

Data lineage tools, data governance tools, that people, you know, AI driven tools that can alert you when something changes. At the very least, tell you that it changed so you can respond. And in some cases, I know back in my old security days, 30 years ago, we used to talk about self healing networks. But there's the idea of a self healing pipeline where the query would update itself based on a change.

I think there's hope there. I personally haven't seen it work at an enterprise level. I don't know if you have, Richie.

Richie Cotton: Seeing it work well. I think sort of vaguely work maybe coil. No, we had Prokalpa Sankar from Outland. We had Shinji Kim from Select Star on the show and they have got some, they've got some great tools for dealing with data lineage and maintenance, but I think you're right that like, It's kind of data contracts and software that I get the future, but for now, probably you do need to actually speak to your colleagues occasionally if you want things run smoothly.

Okay. So, do like the idea of not having broken data pipelines every week and things running smoothly. we're worrying about organizational problems, it seems like there are also problems around like you mentioned that in order to great data products, you've got to have a lot of metadata tagging, you've got to have descriptions of datasets, you've got to have yeah, just a lot of information around what's going on.

And this requires some work to be done. Who's in charge of that? , who needs to make sure that the descriptions are correct and things like that?

Adrian Estrala: Yeah, I think this is the way that I like to roll it out, and I've seen a lot of different approaches, and I don't, we're all still learning, I to other colleagues kind of on the broader data mesh side, and kind of learn a lot, so I love the broader data product community. I say that only because I don't, sometimes you hear people talking about, here's a way to do it, and it, I don't think we're there yet.

I think we're all learning. That being said, this is the way I do it. I love building a data product COE. My first principle is we need a COE to, and about how I used to manage microservices or APIs. I used to be an application portfolio manager in the old world. I had 10, 000 applications and I remember how we used to manage those, right?

And so I think about data products. I apply a lot of my old portfolio management principles. I want to COE. I want to be able to manage dependencies. I want to be able to have some, a very basic sense of consistency across all my data products. I want to draw some fine lines around federated governance, the governance that I manage centrally versus the governance that I allow teams to manage on their own.

And so I have a whole operations model that we walk through, but there's some basic operations principles that I think are really important. We need to have those in place and start to build and mature them before we get to a thousand data products. Thanks. And so you should start building now. So first and foremost, I like the idea of having a data product COE to start to create that broader management framework, operations framework.

When I get started with data products, I don't like to build this big enterprise solution up front. I know I just said build a COE, but a great way to

Richie Cotton: CUE is

Adrian Estrala: a center of, I like, I call it a center of enablement. Our center of excellence C. O. E. And the next piece here, when you get a C. O. E. started, the best way to get a C.

O. E. started is to get a small win. And so instead of going out and trying to roll out data products to a whole company or whole enterprise, I really love the idea of building data products for business for a small team, right? Go work with a team who's willing to roll their sleeves up. I like to do these Pathfinder exercises.

We bring them into a room, we build data products together, they start to use data products. If you're lucky, it works. Sometimes you need to do some tweaking, but you get it to a point where they're, they've solved their problem. And then they start bragging, and then other people start looking over their shoulders like, Hey, how come their products, their projects are getting done quicker, right?

How come they're driving greater speed in their innovation? How come their applications are going through faster versioning? Why does it take us so long? Oh, they have easier access to data. Oh, they're reusing data. And then you'll get other people excited. So build a COE, launch that COE with a couple of small wins.

And then I don't think you necessarily, one of the things that I think we're all excited about is this, you mentioned Atlan a second ago, you got Atlan, you got Calibre, you got some great catalog tools out there, big, big data governance tools, Informatica and so forth. We all want to get to a point where I can go to one of those tools and see all my data products.

that's the way it should be, right? And I hope that we'll have a conversation in five years and that's the way it is. But today, we're not there yet, right? And so I think today. We need to build solution sets. , go back to your modern data architecture we started. I think the data architecture we need today is the one that works today.

And I think today, I think we can start to develop and deploy data products to teams, build a simple marketplace. You know, even if you've got two marketplaces, that's okay. Get people using them, get people excited with them. And I think very quickly, in short order, you're going to have the big catalogs.

come up with much easier, much better integrated data product catalogs. We have a data product catalog built into Starburst. It's awesome. It works really well. Great way to get started. All we're doing is buying you time until the bigger catalogs start to get their data product libraries working.

more consistently, and then they're able to connect to everything in the back end. And now you've got one central pane for all data products. I think that's coming. We're just not there yet. And so this last piece really is about build the things you need today with a plan for how you get to what you need tomorrow.

Richie Cotton: Okay, actually, I do like the idea of having a center of excellence and then you're making other teams jealous because they're not working as well as you

telling get that colleague rivalry.

Adrian Estrala: jealousy works a long way, man. You get people excited. Hey, how come Bob is done for the year and he's only spent half his budget? You know, maybe I'm being too ambitious. But yeah, without a doubt, I think people get excited. when they see other people winning?

Richie Cotton: So the other problem that every organization complains about is data quality. Now, how does the use of data products affect data quality? Is it going to have benefits? Are you going to get a feedback loop where it solves data quality problems, or is it going to make them worse, or what's the deal?

Adrian Estrala: Yeah, I think data products as a whole, I think they're, I usually have my DM Bach book like right here and I pop, this is a moment in the call when I pop it forward and I say, here's the way we used to do it. I don't know there's a shortcut to data quality, so I have to be careful not to, what I'm about to say doesn't come across as a shortcut.

there's a lot of effort still required to fix data quality at the source, right? I don't think we get away from that. We have to get better data. We have to remove duplicate data. We have Categorize, so to speak, and classify data to make sure we understand high risk data from low risk data. All that's really, really important, hard, expensive, and the business is not willing to wait for us to finish that.

And so I think the way that we can go back to win, winning for the business today is if I build a data product and the person that owns that data product is an expert in that domain, it's a finance data product and there's a finance data product owner. And they look at the data in the data product and they can assess it and say, yep, that's right.

No, it's wrong. Fix this, fix that. now I've got real ownership. One of the problems we have as CDOs and data leaders have with data quality is there's no ownership. I can't, and if I find an owner, it's because I told them they're the owner. And in some cases, they don't even have the authority, or frankly, they're an IT person that doesn't understand the business.

They don't really have the knowledge to fix the data, right? , they're not tied enough or close enough to the business processes. When I create a data product owner, they're usually in the business, and they know the data, and they're in the domain. And so now, there's some level of, I don't mean as simple as pride, but there's some level of accountability for the data in their data product.

And, If somebody has an issue with it, they know who to call. That data product, when it gets rolled out and published, it's quality data. It's good data. It's better than what you used to do before. And now, if somebody else wants the same data, they're just reusing that good data. In the old days, somebody would take data, they would fix it in Excel, and they would use it for their team.

Somebody else needs the same data, they go back to the source, And now they're dealing with bad data again. Now they're fixing it, and every team was seemingly fixing data the way that they saw fit, manipulating data and driving down solutions. And today, I want one person to discover it, one person to own it, fix it, and a lot of people to use it.

And so I think that's how we help data quality. And it sounds, I love this approach, and I know that it works, but I don't want it to sound like I'm saying, don't worry about the We still have to. But this is closer to the business. This is what the business is going to actually appreciate. And it's going to make an impact.

Richie Cotton: It sounds like you're almost bringing an economic solution there. So you've got your data mart where people are buying or taking their data products and the ones they don't want. I guess you're going to realize they're not good enough quality and that's going to help you realize where your data quality issues are.

So I like that. Marketplace solution. Okay. I'd like to talk a little bit about careers. I'm kind of curious cause we were talking about data products are fairly new. We've talked about a lot of kind of newer ideas like data linge or whatever. Does this change the skills that data practitioners need?

Are there any skills that you think are becoming more important?

Adrian Estrala: Ah, that's a good question. I don't know, Richie.

Data is such an interesting space because I'm really jealous of some of my data architecture friends, data science friends. They know so much. I listen to like two podcasts a day. One while I'm working out and one before I'm going to sleep. Just trying to keep up with everything AI. It's not enough,

I wish I had time to go back and get a PhD maybe in data science or something. I really envy my colleagues that are so deep into data science, maybe like yourself, Richie. Now, that being said, I bring a different skill set, right? Now, I've worked on the business side for a long time. I was a hacker in the 90s.

Then someone plugged me out of security and said, go work with the business. And then that's largely where I spent most of my time on the IT side, but really closer to, business teams. I bring my own kind of CDO lens to data problems. And so I think the way I look at it, I think is more than anything right now, I think it's, trying to bring the right people together to solve data problems.

Where I think, and it sounds, I feel like this sounds a bit too idealistic, but I think the way we solve problems before is you had an IT team solving something. And they'd hand it over to maybe a business team, and then they would look at it, and then it gets tossed over the fence. I feel like now, more than ever, you've got business teams and IT teams working much more closely together.

So a person like me, who used to be really technical, who lost his way somewhere along the path, and now is really envious but trying to get more technical, I bring my own value to the equation. And I think there's people who are still today super, super technical and super bright, who don't understand the business, who are now getting pulled into the business.

I see this happening much more efficiently and effectively than ever before. And so to your question on skill sets, I think both sides, the I. T. side and the business side, elevate their skill set towards each other. And maybe a different way to answer your question.

Richie Cotton: . So if you're very technical, then you need some business skills. If you're a business person, then you want that level of data literacy in order to be able to collaborate with your colleagues.

Adrian Estrala: And I remember at a Mobossum, I said, probably said that 15 years ago, right? I don't think that's a new concept. I do think that today, more than ever, I can point to practical examples where that's actually happening. Whereas before we were just saying that's where someday we're going to get their hybrid I.

T. And now I think that's the rule, right? Now, we talk about data products, I can't have a successful data product without having some really, really tight collaboration between the two. So now I think it's a reality.

Richie Cotton: Okay.

Adrian Estrala: also is a need. It's a reality.

Richie Cotton: All right. Learn to talk to your colleagues. Scary stuff. Okay. Just to wrap up, what are you most excited about in the world of data?

Adrian Estrala: Ooh You know, what's I'm most excited about is what our kids are gonna do with ai, you know, I'm, without seeing my age, we've gone through these cycles, like these hype cycles of different things that have happened in it over our careers. You know, probably when I started my career, the, and I remember building a webpage on Geo Cities.

Nobody knows what that is anymore, but there was like, nobody had webpage. I had a webpage. I couldn't believe I had one. Nobody even had a webpage, right? There were no ads, there was nothing, and I remember, I give that example to a lot of the young engineers or young people that I talk to, because there's, we have these points in our career, so we see something, we know it's going to be great, we don't know what to do with it, and the next generation takes it and makes it greater, and I think that's where we are right now with AI, is like, we're really starting to understand it, I mean, we're teaching people how to prompt.

We're going to look back in five years and laugh at remember when we used to build a kernel from scratch to fit a video card? And now remember back then when we used to teach people how to write a prompt to talk to an engine? go away fairly quickly. I'm most excited about the next generation here when it becomes even easier for our, you know, kids, for the next generation, so to speak, to really take advantage of it.

I just want to be able to look back and say that I was a part of it. Like, I took it from here to there That's I'm excited to being a part of that. And I'm so grateful that this late in my career, I got this last opportunity to make a difference. And so I'm not that old, but I felt like, like nothing else was gonna happen in this hit.

And I'm like, ah, fantastic. So super, super excited to be here. Super excited for what's coming.

Richie Cotton: Having an impact for the next generation. I like it. That's a, that's a good goal for anyone. Excellent. All right. Thank you so much. Loads of great insights there, Adrian. Thank you for

Adrian Estrala: Awesome. Thanks for letting me talk so much. I apologize, but it was a lot of fun.

Topics

Data Governance

Data Engineering

blog

Data Champions: The Secret Ingredient to Upskilling in Data-Driven Organizations

In this recap from DataCamp’s RADAR conference, Mai AlOwaish, CDO of Gulf Bank, shares how the Data Ambassador program at Gulf Bank is enabling a data culture within the company.

DataCamp Team

6 min

podcast

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan focus on strategies for improving data quality, fostering a culture of trust around data, and balancing robust governance with the need for accessible, high-quality data.

podcast

[AI and the Modern Data Stack] How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence and lakehouse technology, how AI tools are changing data democratization, the challenges of data governance and management and how Databricks can help, the changing jobs in data and AI, and much more.

podcast

How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

Adel and Saurabh explore the importance of data quality and how ‘shifting left’ can improve data quality practices, operationalizing ‘shift left’ strategies through collaboration and data governance, future trends in data quality and governance, and more.

podcast

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

RIchie and Libby explore the differences between analytics and business intelligence, generative AI and its implications in analytics, the role of data quality and governance, Alteryx’s AI platform, data skills as a workplace necessity, and more.

podcast

The Path to Building Data Cultures

In this episode of DataFramed, Adel speaks with Sudaman Thoppan Mohanchandralal, Regional Chief Data, and Analytics Officer at Allianz Benelux on the importance of building data cultures, and his experiences operationalizing data culture transformation programs.

See More See More