Skip to main content

Self-Service Business Intelligence with Sameer Al-Sakran, CEO at Metabase

Richie and Sameer explore self-serve analytics, the evolution of data tools, GenAI vs AI agents, semantic layers, the problem with data-driven culture, encouraging efficiency in data teams, exciting trends in analytics, and much more.
Nov 18, 2024

Photo of Sameer Al-Sakran
Guest
Sameer Al-Sakran
LinkedIn

Sameer Al-Sakran is the CEO at Metabase, a low-code self-service analytics company. Sameer has a background in both data science and data engineering so he's got a practitioner's perspective as well as executive insight. Previously, he was CTO at Expa and Blackjet, and the founder of SimpleHadoop and Adopilot.


Photo of Richie Cotton
Host
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Key Quotes

It's your choice as a data practitioner or as someone who's running a data team, do you want to have those questions come back to you as ad hoc requests? Or, do you want to factor that in to how you conceive of the shape of analytics you deliver? So I have a horse in the race. The horse is, your tool should let your users ask questions 2-20 without having to harass a human. That's self-service analytics.

Every time I hear someone say they're data-driven, I assume they don't know what they're doing. Very few people that I know who are data practitioners, ML researchers, good operations people, data driven is just one of those cringey words that is really the symbol that you are treating data as this magical thing and that you're arguing the space of magic as opposed to concrete, tangible things.

Key Takeaways

1

Design analytics tools that allow users to ask iterative questions independently, reducing the load on data teams by making it easier for non-technical users to engage with data without constant support

2

Build data models iteratively, incorporating user feedback to align data structures with real-world use cases, rather than attempting to finalize complex models in one go

3

Decentralize data responsibilities by empowering teams to manage and define metrics relevant to their work. This approach helps align data insights with actual business needs

Links From The Show

Transcript

Richie Cotton: Hi Sameer, welcome to the show.

Sameer Al-Sakran: thank you. Pleasure to be here.

Richie Cotton: Excellent. So, self service analytics has long been touted as a way to scale business intelligence, but what does that actually mean in practice?

Sameer Al-Sakran: I think it means different things to different people and a lot of very smart people think it's completely non existent. And a lot of other people that maybe are not as smart, like myself, but have opinions, think that it is a different frame to have on how normal humans should consume the product that analysts, data scientists and data engineers produce.

 So, I think it's, it's a perspective shift that I think is deeper than it is like a vendor, like buzz or brand. I think the core idea is people tend to not have single questions. So, you have something that is the sales performance of a product by quarter. Most normal people don't look at that and say, cool, I'm done.

There isn't really a moment where you're like, this was a beautiful dashboard. Thank you so much for having built it. You deserve a raise. Typically it's like, oh, interesting. What happened in this region? Or Does this look the same across all sales reps? Or is this sales thing even working? Should we like pivot the company?

And that typically those questions, which I'll just mischievously say questions 2 through 20, get asked. And it's your choice as a data practitioner or as someone who's running a data team. Do you ... See more

want to have those questions come back to you as ad hoc requests? Or do you want to factor that in to how you conceive of the shape of analytics you deliver?

So I have a horse in a race. The horse is basically like your tool should let your users ask questions two through 20 without having to harass a human. So that's self service analytics.

Richie Cotton: That is very true, that there are very few business problems or business questions that can be answered by just a single question. You actually, you need that sort of level of iteration. So, go on, talk me through, has it got easier to be able to do that where you're asking 20 questions in a row and it's a normal person doing this without, you know, Some technical input.

Sameer Al-Sakran: Absolutely. I mean, I think if you zoom out some, like, let's say from the seventies onwards, There's just been the stair step of you know, the forward march of progress has brought us much easier to use tools. If you were doing this in, like, the 1970s, you're writing, I don't know, ALGOL or COBOL or something horrific, you're using some crazy, like, protodriver, you're using a pre SQL API to pull all this data out.

You know, the 80s brought SQL, then you got spreadsheets, at some point you got, like, Tableau, which, for all that were a competitor, like, did some amazing things. to really reduce the complexity of asking questions. And I think that the average skill level required to productively ask any question in data space has just ticked down decade over decade for the last 50 years or probably more.

Honestly, if you go back further, I just for for a whole list of reasons, I look at the world as pre sequel, post sequel. So like my window ends 10 years before sequel was introduced. Give or take. But yeah, I think that Today, people have much more data, much more context, much more understanding of what's going on inside of organization than they did 10 years ago.

And some of that is the proliferation of just reporting as a thing in every tool. And some of it is that the tools we're using both to collect data to shape it into ways intelligible and then to pull it out have improved. And if you fast forward six, 18, 24 months, I do think the proliferation of natural language interfaces to data is going to increase and that increasingly you can just ask the damn computer like, Hey, like what happened in February and you'll probably get a crazy answer today, but in 24 months you'll probably start getting a viable sophisticated answer that the average human can just like roll with and ask questions three through 20 at that point.

Okay.

Richie Cotton: Okay. So, here, like the big change is really gerontive AI then, and I do agree with your point that things are getting a lot, Oh, okay. Not gerontive AI, go on.

Sameer Al-Sakran: I mean, I'm way more excited about tool using agents than I am about generative AI, personally. So I think it's still LLMs, like we're still discovering all the weird and wacky things you can do with a large language model. I still think that the problem of hallucinations and like random crap coming out of the other side of the generative process is very real.

I do think that Most of the current players in the space are too enamored with demo mode as opposed to actual real life question answer mode. But the writing's very, very clearly spray painted on the wall. And, we are definitely going to reach a point In quarters, if not years, probably not years of like, Hey, I just want to talk to the computer and have it tell me things.

So I think that, we are in the process of at minimum a shift similar to the introduction of the mouse, where you had a bunch of software that was the pre mouse area to type things and memorize keyboard shortcuts and, keep a little set of notes about like your favorite set over here.

And so you memorize them. So we're like, you said, quick topics. And so I think similar to how there's transition from I have to memorize a bunch of commands to I can just click on things happened, we're going to have from click on things to click on things and, like, talk in natural language or talk in natural language to the computer.

I think that one's baked in, but if I can toss it, a grenade in the punch bowl, I think this whole, like, thing of, like, let's generate SQL based on natural language is, like, Not the path to go down.

Richie Cotton: Interesting. Okay. And this is because of the hallucination project problem or are there other issues there?

Sameer Al-Sakran: I just think there's a subtlety to what most people want that is not, strictly speaking, human derived, but is nonetheless linguistically ambiguous. And to, make that less of a torturous statement I'll use the example of like, you have a business, the business makes money. There are a lot of terms to describe how businesses make money.

You've got revenue, you've got like recognized revenue, you have bookings, you have earnings, you have EBITDA, you've got like all kinds of things that I'm too checked out to fully understand in the accounting world. But like all these things have meanings and you can mix and match them. And I think that there is insufficient coverage in the trading corpus to really disambiguate all this stuff and to then turn that into, I have an arbitrary schema, let me figure out what magical joints do, and then like what revenue recognition schedule to force on top of that.

Like that just isn't gonna happen for a while. I suspect if you draw the line out, it'll happen at some point. I don't want to give predictions in like number of months or years for that, but like, we're not there. I do think that, hey, here are some numbers that we know how, how to describe.

We know that these numbers have certain semantics and certain meanings and certain just, like context that the human in this company cares about. And dear computer, can you pretty please massage these numbers and display them to me a certain way? Like, that's a very tractable and you can make that work today.

So, generate some SQL based on, you know, net revenue retention, like, isn't happening right now. And it's like comical every time you try especially arbitrary schema. But, hey, here's a bunch of metrics I defined, here's semantic layer based on the semantic layer. Dear computer, please tell me what happened in February of this year that made revenue go down and you can probably cobble together something very, very compelling.

so it's not that like large language models are pointless. It's just like, it's not going to be that easy.

Richie Cotton: Okay. So, this idea of a semantic layer where you've got some sort of business meaning baked into your code. This seems very important then. There's maybe more hand coded SQL at a lower level and then you're using the Germ2BI, the natural language interface, on the semantic layer.

Is that correct? Actually, we probably need to back up and just explain what a semantic layer is as well.

Sameer Al-Sakran: Yeah, so like, I mean, humans have a different mental model of what's happening in their business or their organization or their massive multiplayer game than the data model that lives on disk.

And so in your mental model, there's certain relationships, there's certain concepts that are like the primary concepts. So most people in a business setting, retention is a capitalized word. It's a very specific thing. There are different kinds of retention. Those kinds are important and do different work, different jobs, and that the definition of what retention means is not trivial.

So there is a generalized, hey, it means that the number of people that stuck around, but if you start having price tiers, you start having reactivations, you start having people that pause this question and come back, you start having people that upgrade or downgrade. There's all these things that impact the actual number.

And that has to be defined somewhere. And so, the semantic layer effectively is where is that defined? And so is that defined in code and sequel in free processing? Is it defined in your dashboarding code where you mark up the data afterwards? But at some point, like the revenue recognition schedule has to be applied or the logic for retention has to be applied.

Or what activity means has to be defined like is it active user? Some of the logged in or there's someone that looked at a photo or there's someone that like solid full video? Does it include people that are like drive bys that are not logged in? Do you want to combine your logged in and you're not logged in session somehow?

Like all that stuff has to be woven in somewhere. And I would say that's not strictly speaking just the semantic layer. It could also be, you know, pre semantic layer transformation as well. But it's that, collapsing of your whatever correct or right data schema you have for storage and transforming that into something that more closely maps to the end user's cognitive model of what they're looking at.

Is just like for this conversation call the semantics, asterix layer.

Richie Cotton: Yeah, all those points I definitely know very well definitely a pain point to data cap is like arguing over, well, how is this defined exactly? And it always gets very complicated because sometimes it involves product details. Sometimes it involves like different commercial teams.

And then you've got obviously the data people need to be involved in that as well. And going back to your original point, you define those like normal people wanting to be able to work with data. How does this semantic layer stuff get defined then when you've got so many different teams involved, who needs to do it?

Sameer Al-Sakran: I mean, who does it is, a pretty gnarly and pretty politically charged conversation. So, practically speaking. It'll be some mixture of data engineering finance and or business operations. It'll probably involve the analyst team or some places it'll be the data scientists. But functionally, there are people that implement this and people that figure it out.

currently, typically, the center of gravity is a lot of the people that know how to turn those decisions, that information into something to be executed. So data engineers that are doing this in, you dbt, you got some tools or some organizations, it's the analysts that are kind of just rolling this up in some places, it's in the actual schema arrest.

In other places, the data warehousing team takes care of it. But fundamentally, those decisions come from somewhere. So there's some accountants somewhere in the organization that read something that the IRS puts out that says, Oh, we need to recognize SAS revenue this way and services revenue this way.

And so that decision has to flow through to the people that implement it. And then in whatever platform you have, you implement it. The thing that I'll point out is that that translation Is error prone, tedious and the source of a lot of heartburn and so anything that you can do to reduce the skill level required to capture those decisions and to put them into whatever execution like format you need pays off organizationally.

So if you're forcing this all to be all be done in super complicated Scala modeling somewhere in the chain, then you're forcing someone that speaks Scala to talk to someone who speaks accounting, and you're relying on that transmission path to be error free. And, plot twist, that's not going to happen.

So the degree to which you can give your accountants tools to capture this and getting as close to the accountants or the finance people as humanly possible or the product manager that is running retention or the customer service rep that's dealing with customer service managers dealing with like what's that user satisfaction means and so the ability to federate or decentralize.

The capturing of this information is to like your overall interest. And I do think that there is a little bit of protectiveness or defensiveness or just desire to be part of the data priesthood where it's like, Oh no, like the data priesthood must decide how like this model happens. And like, we are the, the blessed keepers of the DBT repository.

And like, only we can do this. And the reality is in my experience, people in, the line of business are much more sophisticated in understanding these concepts. So the average marketer has a much tighter understanding of a lot of marketing ops members than the average data scientist or analyst or data engineer assigned to marketing.

And so being able to put more of the load onto the people that have the cognitive model as possible Is beneficial and dummy down the tools that you're doing all this and is very beneficial and so letting again creating a tool that a normal person that understands their space deeply can like type in those things or like say the things they need isn't that like best world for this.

I think in reality this compromise is made, so SQL is better than Python is better than Scala. And something that's sort of new and easy to use is better than SQL.

Richie Cotton: Yeah. I love the idea of the data priesthood. I think some people just want to wear robes to work, but yeah, definitely you've got this big gap between if you're doing some data engineering and you're doing some accounting or marketing and then, yeah, how do people talk to each other?

So I like the idea of bring people close together and making wider variety of people being able to use these things. So, Maybe we need some motivation here. have you seen any success stories for where this has worked really well and you have had people who aren't that technical being able to perform analytics?

Sameer Al-Sakran: We have a large customer that has managed to do this pretty well. I think what it looks like for them is, I want to say they have a data team of five ish people supporting a 2, 000 person company. And a lot of what that data team does is they're creating data sets and starting points for their business.

For people to use themselves and so the lens they view themselves through is not we are here to create things that people will then use. We're here to facilitate people asking their own questions and ideally if people are not able to will try to figure out how to make their able to. And so the perspective that is held is that analytics is not a service desk.

It's not a place you go to to say, Hey, Bob, can you please tell me X, Y and Z? But rather it's, Hey, Bob, I'm trying to figure out this number and I can't do it. Can you like tweak the model such that I can't? And so I think that is one of primary shifts. I do think also and so this is my own personal opinion, but establishing some sort of, let's say, institutional know that is liberally applied towards bespoke analytics requests, and not letting the data or analytics team become a place where, I'm trying to figure out who I want to annoy and how badly I want to annoy them. I can't go into like a brief digression, and I promise it'll come back to where it started. All right, so there is this common corporate pathology where if I'm in a meeting, and I don't know what's going on, and the decision has to be made, and we've got to make this, that one of the socially acceptable outs for me not knowing what I'm talking about is to demand more data.

And it's a pattern that you see over and over and over and over again. It has a lot of names. It could be like, we need to be responsible and look through all the relevant information. We got to be data driven. We got to do X, Y, or Z. But the fundamental thing is, I want to deflect my responsibility, either individually or collectively, from making this decision or figuring this out.

And I want to now put another burden on someone else, which is, Hey, dear so and so, please get me more information. And it's often done in a very vague way, so it's often some version of like, can you do retention analysis for me? Or can you like, look into blah, blah, and the vaguer it is, and the more work that is farmed out, and the more some poor analyst team has to like, scurry around at like 3 a.

m. on a Sunday, like really the better it gets me out of my pickle. And so again, like going back with that digression and coming back in, I think the ability for the analyst department to say no to stuff like that and to say no to like the random barrage of ad hoc queries and to force. The department to be enabling as opposed to just smashing out dashboards, analyses, charts on behalf of individual questions is what lets you scale and scale that team and critically like foster self service in the organization.

So an organization where anyone with power always demands of bespoke analyses. and everyone else is left to fend for themselves, isn't really doing self service. What they're doing is they're starving people information unless they have power. And for the people that don't have power, it's like, hey, like, do this thing whenever you want.

I think the ability to say, this is how we're getting data into your hands. if you want to hire an analyst on your team and you want to make that up a thing, that's great. Feel free to do that. But fundamentally, we're decentralizing and distributing the load of asking questions, and the decentralized team is there to facilitate and enable.

And iterate on the data sets until this is possible. Because then digression number two, I'll bring it back, which is it's actually a lot of work to create a data set that can be self serviced. And it's not work that happens magically. Like you've actually got to go in there. Make a schema shape, make a semantic model, show it to people, all the ways in which it sucks, come to grips with the fact that you're not as smart as you thought you were, that you're going to have to take the feedback of your users, weave that in to create something simpler, you have to let go of things that you think are quote unquote correct.

And try to map this to the end user's cognitive model. And all this is iterative, it takes work, it takes some amount of humility, it takes a lot of skill. It actually takes a lot more skill to dial in a model that accurately represents the user's cognitive model than it does to just like take whatever crap you have on your SQL tables or your like Parquet files or whatever, wherever it got going on these days, and then just be like, I don't know, that's the data model.

And I guess that should let us like turn this from like an integer into a text field or something. I don't know. And then call that a model. Like there's actually a lot more to it. So like do the work. Maybe it's like the motivating thing, which is like do the work to let yourself be lazier in the future.

So it's all about front loading the work you do and then, foisting this all off on your constituents to do themselves.

Richie Cotton: All right. And then hopefully one day life is, is glorious for everyone because you've done all the hard work upfront. So, I have to say, I don't think this was your intention, but I'm now thinking it's a great career skill. If you want to put off making a decision, you can just be like, Oh yeah, we need to do some more analysis on this.

And You get to delay it. I'm curious, do you use that technique yourself?

Sameer Al-Sakran: I try to catch myself when I'm doing things that future me would want to slap present me for. So there is a certain, sorry.

Richie Cotton: Yeah, that seems like a good rule of thumb. It's like if you're going to be disappointed in yourself later on, then yeah, probably don't do it. But sorry, I interrupted.

Sameer Al-Sakran: Oh no, I just say like there's so maybe like this also segues into a different thing which I do want to cover in this context because I think I would like. this message to spread a little further, which is, every time I hear someone say they're data driven, I kind of assume they don't know what they're doing.

Richie Cotton: Ouch.

Sameer Al-Sakran: very few people that I know who are, like, data practitioners, ML researchers, good operations people, like, data driven is just one of those cringy words that, like, is really, like, the symbol that you don't have proper nouns for what you're doing, that you are treating data as this magical thing.

And that you're arguing the space of magic as opposed to concrete, tangible things. one of the things I've said, over the years is data is a four letter word, and it's really a symptom of ignorance. And, you know, real people have real words for stuff they care about and understand. So you have users, you have customers, you have visitors, you have contracts, you have videos, you have, like, you have advertisers, you have creative pre rolls.

Like, these are all things that have proper nouns, that have semantic meaning, and they don't live in the world of data. They live in the world of like actual real concepts like data is this just I get now for a bunch of stuff I don't understand and data driven specifically is usually some version of like well other data tell me and there's this like really just to get super dorky for a second what you're really saying is.

I'm not going to be aware of the model that I'm using to evaluate this data. I'm going to pretend that my naive, messy, sloppy model is somehow capturing reality. And I'm going to then defend my interpretation of a handful of sparse data points on the basis of this hyper simplistic model, and pretend that, the data is telling us what to do.

And it dramatically under represents How much interpretation of these data points is happening. Unless you pretend that your sloppy model is actually like just what the data is. And, you know, the functional reality is that all these semantic things we're talking about have specific meaning.

There's a model that define that meaning in that context, and that the awareness of what that model is lets you make good decisions. so, again, to bring it all back in, to not have this be a one hour rant about me going off, which, you know, we can do some other day, I do think it's important to be cognizant of how you're evaluating data, what it is you're looking for data to answer, and To fess up to the a priori mental model you have for what the data says, and then when you're having conversations about making decisions, to do that with full awareness of your simplifications and what it is, how it is you're interpreting it.

And this, for me, mostly shows up in practice people are discussing, like, findings that are drawn on view of the underlying system that basically drops all the interesting texture and information in it. And this is a very, very common pathology for junior analysts, the junior data science to do, where you just like some bunch of stuff, you like, look at a couple of dimensions, you say like, oh, we should just change the conversion funnel because obviously there's a 20 percent drop here and 20 percent is bad.

And there's a 2 percent drop everywhere else. So clearly this step is the bad step. And at some point you're like, but dude, that's where we ask for a credit card. Of course, it's gonna be a drop, like there's certain things like the physics of the system or the actual reality, like the things that have a tangible reality and like what it is you're measuring, it's not just a bunch of numbers, like there's a thing happening there.

And if you don't understand that thing, you look at this, funnel and you say, there's a kink here. That's really strange. We should fix that kink. And it's like, thanks. would never have thought that asking someone for a credit card might cause people to drop off. So, going back to like humility and just understanding the system and like fessing up to the state of your knowledge about the system you're studying, and not just treating as a bunch of numbers that you smash together and put on a chart and then like interpret the chart as if it's just this abstract idealized set of time series.

I

Richie Cotton: So I definitely agree that if you're not relating your data analysis back to what's really happening in the business, then you're going to come up with nonsense. You're going to get bad ideas. But you mentioned that Yeah. Data is often just as in, well, I don't really know what's going on, but isn't that the case that a lot of the time in the business, you actually don't know how to proceed and therefore you need to use data in order to get some insight.

Sameer Al-Sakran: so because I don't think data is what informs you what's going on. I think data is what you've put into your mental model to then interpret. And I think there's again, like I mentioned slide of hand, because I think that's what's actually happening, which is, you know, you're assuming, for example, I'm just like groundless in reality.

I do like the conversion funnel that conversation a little bit because it's generally applicable to many businesses. There's lots of things model business. I do think the awareness of the segments. Of users going through is critical. So I think that treating a conversion funnel as just a monolithic blob is okay from some perspective.

But if you're looking at that and you're just saying, this stage has a drop off to this stage and I'm going to interpret that. I think if you don't understand what that stage is doing, What's actually happening in the different segments of your user base for that? Are you flushing people that should be going through the funnel properly or should be going through the funnel?

Or are you disqualifying people that should not? Like, that's a different conversation. And so I think just looking at like, that 70 people, 70 percent of people here down to 50 percent of people here that that's bad is simplistic and kind of naive. And I do think that this is where understanding the physical system that your numbers are capturing, there is an underlying mathematical model that is in your head.

That you're interpreting these numbers through and your awareness of that model is critically important and to understand what's happening. It's not necessarily you keep slicing data until it gives up and you find some subsegment with some metric that looks different. You're like, well, it seems like we have lower conversion in.

Eastern Moldova than we do in like, the mainland of the United States. It's like cool, but like what's happening and why? And I think this is where domain knowledge you shudder to say, but qualitative research understand the world that you're living in, not just like the domain, all of that Needs to shape the model that you're using to interpret these numbers, and I think that this that there is a core skill in, specifically data scientists, more so than other Segments like, analysts and data scientists, more so than data engineers, where understanding the system that you're measuring is what makes your work viable and interesting.

So there's a certain like, junior analyst disease where you run some numbers, you show something's correlated, you know, you show up to the person running that business unit. Like I discovered there's a correlation between asking for credit cards and studios or drop offs.

Therefore, we should not ask for credit cards. And it's like, really? Like, come on. And so, I'm not saying don't look at data. That's exactly not. I'm not saying things on top of it, but it's understand the perspective you're bringing onto the data. That's up to simplicity or the fidelity of that model that you have in your head.

And, in my own attempts to make everything jargony which is not true, but like just for the sake of this conversation I would call it being model driven as opposed to data driven, and it's the awareness of how it is your interpreting data Treating that model that's in your head as something that's malleable and something that's explicitly stated as opposed to something that's implicitly hidden behind, the interpretations you make of these like random data points in time series and histograms and whatnot.

Richie Cotton: Okay, yeah, I'm definitely with you there that you need to make sure that you're not just acting blindly on, the numbers look like this, and it needs to, you know, Match back to some kind of business model that either you have in your head or written down somewhere. All right. So, I want to go back to a thing you said earlier about so you gave an example uh, of a company with five data scientists supporting 2, 000 people.

And you said that one of the big changes was that the central data team that were just preparing the data to make sure it's high quality and everyone else working on data was embedded in business functions. Okay. That sounds very cool, but

Sameer Al-Sakran: Not that they're, not that they're embedded, so it's not that there was embedded analysts, it's that people that would in other companies be serviced by analysts were doing this themselves. So the key thing is not that you're embedding analysts in all these teams. Sorry, that was a throwaway line of me making fun of like the big, scary VP who's demanding things.

It's more that there was an emphasis on creating data sets that were interpretable and intelligible and using tools that were simple enough for the end users in that function to use themselves. Like Metabase.

Richie Cotton: all right, wonderful. Yeah, I like that idea. So everyone else in the company is sort of implicitly an analyst because they can use tools themselves. How do you get to that state? It sounds like it's the dream, but maybe quite tricky to get to. So where do you begin?

Sameer Al-Sakran: Data data ship. So the data that is put in front of normal humans has to be interpretable and usable by those normal humans. So, no matter what the correct way to store data on data warehouse is, you want to present a transcript. a form of that data to an end user that matches their like mental model.

So excessive normalization is a really bad idea. mixing and mashing foreign keys, embedded arrays and join tables is a bad idea. So if you're exposing through tables in your data model to end users, you're just doing it wrong. It's pre aggregating or shaping the data such that the things that end users really care about come out quickly as opposed to requiring a self join and a bunch of crazy filtering.

So it's really just like pre chewing the data for your end end users and not stopping at this is the correct way to represent events. We have an event log. You can do whatever you want. The answer to that is like, I can't do stuff with that long. It's like, but can you pretty please, like, at least sessionize this or at least pre aggregate a few things or like just make this closer to what I think of when I think about product analytics.

So I think that there is a fair amount of work that is again, high skill, difficult and intrinsically error prone and hence iterative in nature that this data team does. And, hypothetical data team at your company would need to do to get it. something in front of people they can use. And there's a whole corpus of knowledge and set of tools around product design and how we build products that are usable.

 there's a lot of things that we do casually today that in a different generation would have required training in a specific tool that today are just like, I know how to click on things. I'm like, see a bunch of prompts and things make sense. And like, I'm not, using weird, crazy keyboard shortcuts in AutoCAD.

I'm just using Figma and Figma just kind of lets me do things very, very easy. And so I think that all the things we learned on the prod design side of the world can be applied in, metaphorical sense to data set design. And that we should be treating the design of data sets in the same way that we treat the design of user experiences for end users.

Because you're designing data and like the UX of that data is what's in the columns, What is being what are the connections? Are things pre aggregated? Are they sessionized? Are you pre computing certain metrics on certain things? are you forcing users to engage with something very different than what they're meant to model?

And basically, to what degree The thing that hits them is just whatever the right format is on a SQL and a REST database versus is it actually something they're able to engage with? And yeah, that's a lot of work and you're going to get it wrong. And you know, you're really designed for someone whose brain is not like yours.

So you're going to have to, take a shot at it, go to them, see what they think, watch them use it, watch them fumble it, take that, go back, redesign it, put in front of them again. And you either need to do this in a user testing sort of way, where you just like show in front of people and ask them to do things that they can't do them.

Yes. I'm like, why? Why are they not? Or you'd have to instrument everything and make sure you deeply understand which columns are being used? What custom navigations people are using? Are they having to tap an analyst on the shoulder and get a certain join or certain multistage query done for them, but then reuse and edit.

But there's a lot of Looking at what your users are doing, understanding whether they are able to use the data set that you designed for them. Or whether they're just looking at being like, I don't know what to do here. Can you please, like, get me my retention numbers for last quarter? And again, that's a lot of work, but it is front loaded work that then lets your constituency or your stakeholders or the line of business folks be responsible for queries 2 through 20, and not you.

Richie Cotton: Okay, so this sounds like we're getting quite close to the idea of data products where data is packaged up in a particular way and it's called associated documentation is designed to be used by the teams without your sort of intervention. Do you want to talk me through what needs to happen to go from like data to data product then?

Sameer Al-Sakran: I would maybe remove documentation because I think documentation like in product design is a crutch. So, you know, the statement that you don't know how to use this, therefore I'm going to write some words for you. That will inform you of how you should use it is kind of a crutch for bad design. And so I think the mechanics are some form of sketch out just like the entity diagram or just like what the system that you're studying in a company is.

what the primary nouns are, what numbers matter to them, what are the aggregate metrics, what are like the per entity metrics, you just kind of like catch that out on a whiteboard or on, know, notepad or something with the person who's in business. You like to rotate that in your head a bunch of times you try to understand how like how that maps to the underlying stuff on disk.

and you understand what questions they ask. You understand like what, applications are required. You understand what were the common filters, how they think about those filters. are there things where you have. In your mental model, like a foreign key where it's actually like nesting or composition, are there lists of things that are being represented foreign keys?

But there's a fundamentally different set of Legos in that cognitive space than on in sequel. And so you somehow have to map the separation of concepts and connections That's extra in the sequel ish data world and reshape it towards something that more closely matches that entity diagram.

So you're going to do a bunch of pre joins, you know, like pre compute a few things you're gonna have, enriched users. There's all these things that you do. That ideally get you closer and closer and closer to that naive nap, you know, bar napkin diagram of how the users think about the world.

And as you get closer, you show it to them, you basically ask them to know, using whatever tool you're using, say, Hey, can you figure Moldova last week? You watch them do it and if they get it wrong, you just, just take notes and you go back and you try to like re reshape what you put in front of them.

And you should probably emotionally budget for three to 10 iterations. So if you think you're gonna get it right the first time, you're delusional and you really should just like fess up to it. It's gonna take you a couple shots. if you're really close to the kind of person that's doing the query, You'll get there in three. if you're very different than the person doing the query. So if you're talking about like, you're a data engineer and you're getting a data set for performance evaluation for a back end team, that's probably three iterations. If you're a data engineer and you're trying to do something for call center reps, it's going to take you 10 and just emotionally budget and like time budget for that and just go through the process and don't give up when you put it in front of them and they can't do things.

think the most common reason people don't believe in self service analytics is they went through this process with the expectation that they're going to have a user conversation, they're going to get the requirements, they're going to write the right data model for those requirements, they're going to hit submit and publish, and then magically the end users are going to know how to use this, and when they don't, they'll be like, oh, it turns out you have to be part of the priesthood, otherwise you can't really do analytics, aha, I'm special.

And it's like, no, dude, like, you have nine more cycles, go through them. I

Richie Cotton: That sounds amazing, but also it feels like it's very difficult to Budget for nine, ten iterations in practice because someone's gonna say, well, why is it gonna, why does it take ten attempts to do this right? Shouldn't you do more planning up front? And so how do you go about building in that culture of iteration with multiple teams working together?

Certainly I think the cross team coming together over and over again, that, that's generally organizationally tricky.

Sameer Al-Sakran: think that there's, there's two different parts of it. So one of those is there's a difference between folding in the iterations and having the iterations be improvements. So I think if you're able to say, Hey, we're going to start small, we're going to get some like stuff in front of you that takes care of questions two through five.

And we're just going to get something like we're going to do our very, very best to get you something that answers 20 percent of your questions and that you then build on that in currently. So if you're trying to do the whole thing minutes, you kind of do it. And again, I think that there's a lot of things we've learned in software engineering that should be applied to data engineering.

But they're not really. I think people are coming around to version control and testing and observability decades after the software engineering field as a whole did. I think there's things around iteration and de risking and prod discovery that, again, has been part of the playbooks for at least 10 ish, maybe 15 ish years.

That the data engineering community needs to like, copy over to their world and like, yeah, again, copy with modifications. You have to do exactly like line for line, but in the same way that you wouldn't build, some new social network in one shot, and then when it didn't work, you'd be like, well, you know, we don't really have time to do more than one iteration.

at some point you have to get the organization to understand that waterfall software development is really, really tricky, and it is likely to flame out. And delivering analytics. In a responsive iterative, adaptive way is actually a good thing and that the organization is better served by us delivering 10 percent than another 10 percent another 10 percent another 10 percent in weeks 246 10 and 14 rather than taking year and a half to deliver all of this in one big mega project.

And then probably fail. So I think, again, in software, in general, mega projects don't don't turn out very well. And lot of the ways we found to do riskless is to ship things in smaller increments. And I do think again, that is a lesson that for the people who are running data infrastructure and data teams, they should internalize.

And in some ways, it's the job of the director of data or the VP of data or the manager for sure. that's responsible for it to contextualize this with a broader organization and to show them a that they're better off. So like they're going to get this like train of goodies every quarter and that just more and more will show up and to wean them off of the delusion that you're going to get it all right in one shot.

Because again, like software doesn't work that way. And I do think that all the data stuff we're doing is largely software. It's not exactly software development, but like it's really, really close. in many core ways.

Richie Cotton: Absolutely. And that's really interesting that there are a lot of skills from outside the world of data that are actually really useful for for data professionals. So you mentioned the idea of like switching to an agile methodology from software engineering. You mentioned things like experience skills and design skills.

Are there any other skills that you think data professionals ought to have from outside the field?

Sameer Al-Sakran: I think those were like the primary ones. I can dig deeper, but I just reinforce the emotional budgeting, which is, very few software engineers today come up thinking they're going to get it all right the first time. Whereas comparatively, I feel like I talk to a lot of analysts that like are going to spend eight weeks on a dashboard and deliver it, and then hope for the best.

So is probably the biggest one, and it is kind of a soft skill because the practice of chopping up a bigger set of dreams into a set of milestones is a skill, and it's difficult. And so the ability to do that planning, whether you are an engineer manager or junior analyst or data scientist or someone changing careers, like that is a thing, and in some ways it is more predictive of your success.

Then a lot of the hard skills are so you're great at SQL, but you suck at decomposing the problem and you keep biting off more than you can chew. Like, guess what? You're not going to deliver links. So like we had that ability to deliver and that ability to look at a big problem, understand where the seams are and understand how to match.

The seams where it's easy for you to code or easy for you to write or easy for you to generate or like design with the ones that correspond to what the people you're giving this to actually need from you. Those seams are never matched perfectly. So it's very, very rare that the engineering or the data science milestones correspond to how users perceive value and that the art of knowing where to slice is something that I honestly think is a very significant term in how successful you'll be.

And so getting practice at that because you're not going to do right the first time. So that's one of the things that look for opportunities to practice, look for ways to like work towards it and critically be less defensive about why your initial set of milestones is correct or is right. And instead think about from their perspective what they're getting and milestones 1, 2, 3, 4 and 5.

And use that kind of sensibilities of what they're getting from you as a way to better serve them. So in many companies, data science or analytics or data engineering, you're successful through other people. You're not like building this grand edifice that will, be shown on architectural digest.

Like you're helping people with jobs do that better. You're helping people make better decisions. And the degree of success you achieve is largely measured by do they make better calls? It's not do you make them feel stupid and do you get to feel smart about it? It's are you actually able to get things in their hands and make them better versions of themselves?

And I do think that that viewpoint will go along. So that specific viewpoint of is my plan for how to ship this giving people, you know, incremental wins and doesn't make them do better, at each step. is a key thing to selfishly learn because it'll make you look better in turn.

Richie Cotton: Excellent. So, I agree that coming up with milestones within a project, very difficult. It seems like this is part of a larger suite of skills around project management. So do you think data professionals also need some project management skills, or is that something that should be outsourced to like a dedicated data project manager?

Sameer Al-Sakran: I think that I would push back a fair amount on that statement. And I'd say that problem decomposition is a key determinant of success for pretty much all roles. And if you're relying on someone else to do problem decomposition for you, you're basically relegating yourself to be executor. And that you're effectively putting a really, really severe ceiling On how large and how impactful a project you can run yourself.

So, yes, there will be project and product managers that do this at a different level, but even within like any given task, if it takes more than one day implement, you're doing probably composition, whether you want to or not. Now you can be doing problem decomposition that you're going to sit there in your editor and just like, you know, hit run every while, every once in a while and see if it works or you're being intentional about it.

So this statement is be intentional be aware of like what the consequences of your decomposition is. And, the main thing I'm just saying is that, like, That skill of decomposing a larger messier problem into a set of steps is a key thing that everyone should be learning. And I actually say it is one of the largest determinants of the success of analysts and data engineers that I've seen in my work, which is if you're able to do that, you're able to tackle bigger and bigger projects.

At a much higher rate of success, if you're relying on someone to give you a bite sized pre shoot project, that's where they figured all this out for you, then you're kind of relegating yourself to B League.

Richie Cotton: All right. So, before we wrap up, what are you most excited about in the world of analytics?

Sameer Al-Sakran: I'm excited about analytics kind of going away, and I think that there's, a couple of forks in the timelines that we're in. I do think that historically analytics. Has been defined by what people sell, so you buy a business intelligence tool or you buy a dashboarding tool or you buy a GIS system or you buy, a data science platform and there's all this like really dumb, self limiting things industry has done that are really just there so that you can be in the right quadrant on the Gartner reports.

And I think that a lot of that is increasingly being silly. And I do think that there is a recutting of what the domain of a given data product, team, whatever it is. And, to state more concretely, I don't know why so many analytic systems are read only. I think, we're doing some stuff that makes that weaves and writes and transforms modifications.

I think like this missing stuff our competitors are doing is other players in the space that are mixing and matching internal tools reporting and charting. I think like air table and ritual are both adjacent to us and doing things that blur the lines. Super base does things that blur the lines between data exploration and, you know, R.

M. And modeling. And I think that Yeah. I'm very, very unexcited about the future of BI. I'm very excited about the future of, like, I've got a database with stuff in it, and I've got humans that want to, like, do stuff to information. And, you know, they want to view it. They want to massage it.

They want to use it. They want to integrate downstream tools. I think that the specific container that a lot of people building tools are kind of forcing this into and not respecting the natural desires of the person staring at those pixels, but instead being like, Oh, no, it's out of scope. It's not something like we're dashboarding tool like don't put buttons in there.

And so I do think that a lot of that is slowly being shaken loose. And so I'm excited about the next, that stage there. And I do think one of the things that will really, really drive this home is just world of LLMs and generative AI, where it's just like, this thing can do so much more than just repackage up a time series and like, vomit it out as a chart.

are you really going to stop there? And why? Okay. what is that boundary that you're hitting that you care so much about just like take that data, throw it to another agent, have that agent do some stuff for you, have it all be packaged up and sent over to third service, have that service and do some magical things for you and do that whole thing in one playground and don't stay in your lane quite so much.

So I'm excited about people building tools, not staying in their lanes.

Richie Cotton: All right. So really going beyond just, I'm a consumer of data too. I can actually go around, play around with it and have I guess take an action on the data, even though I'm not that technical. That sounds like a bright future. Yeah, I'm looking forward to it. All right. Excellent. Thank you so much for your time Samir.

Sameer Al-Sakran: Likewise, been a pleasure. Talk to y'all soon. 

Topics
Related

podcast

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

RIchie and Libby explore the differences between analytics and business intelligence, generative AI and its implications in analytics, the role of data quality and governance, Alteryx’s AI platform, data skills as a workplace necessity, and more. 

Richie Cotton

43 min

podcast

Towards Self-Service Data Engineering with Taylor Brown, Co-Founder and COO at Fivetran

Richie and Taylor explore the biggest challenges in data engineering, how to find the right tools for your data stack, defining the modern data stack, federated data, data fabrics and meshes, AI’s impact on data and much more.
Richie Cotton's photo

Richie Cotton

50 min

podcast

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.
Richie Cotton's photo

Richie Cotton

45 min

podcast

From BI to AI with Nick Magnuson, Head of AI at Qlik

RIchie and Nick explore what Qlik offers, including products like Sense and Staige, use cases of generative AI, advice on data privacy and security when using AI, data quality and its effect on the success of AI tools, how data roles are changing, and much more.
Richie Cotton's photo

Richie Cotton

43 min

podcast

[AI and the Modern Data Stack] How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence and lakehouse technology, how AI tools are changing data democratization, the challenges of data governance and management and how Databricks can help, the changing jobs in data and AI, and much more.
Richie Cotton's photo

Richie Cotton

52 min

podcast

Scaling Data Engineering in Retail with Mo Sabah, SVP of Engineering & Data at Thrive Market

Richie and Mo explore data engineering tools, data governance and data quality, collaboration between data analysts and data engineers, ownership mentality in data engineering and much more.
Richie Cotton's photo

Richie Cotton

51 min

See MoreSee More