Skip to main content

The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google Cloud

Richie and Gerrit explore AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, and much more.
Nov 21, 2024

Photo of Gerrit Kazmaier
Guest
Gerrit Kazmaier
LinkedIn

Gerrit Kazmaier is the VP and GM of Data Analytics at Google Cloud. Gerrit leads the development and design of Google Cloud’s data technology, which includes data warehousing and analytics. Gerrit’s mission is to build a unified data platform for all types of data processing as the foundation for the digital enterprise. Before joining Google, Gerrit served as President of the HANA & Analytics team at SAP in Germany and led the global Product, Solution & Engineering teams for Databases, Data Warehousing and Analytics. In 2015, Gerrit served as the Vice President of SAP Analytics Cloud in Vancouver, Canada.


Photo of Richie Cotton
Host
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Key Quotes

I think the key to being successful with data is understanding the essence, the very essence of data -driven value generation, which is, in a very simple put, the more data signals you can bring together, the more you find hidden or latent unintuitive patterns in your data, the more you can build services or models using that data, the more customer value or customer services you can create, which generate more data for you, for which you gather more data signals, allowing you to build better services and models, generating more data. That's the flywheel of data, right?

As it turns out, humans are really bad at reasoning about data. So on the one side, there's great books that teach us about all of the biases that each of us carry that kind of makes us want to look for the data we want to find rather for looking across all of the patterns. That's one piece. But the other piece is there's a genuine limitation in our ability to understand complex patterns and reason about them. If you doubt me, then why is chess hard, why is it hard to multiply four digit numbers, our brains is not made for that. This is where AI technology is massively helpful, because it takes all of the best practices of good data analysis, how do I reason about data? How do I look for data? How do I present that? And makes that broadly accessible.

Key Takeaways

1

Utilize AI to transform one-time data explorations into continuous insights, reducing the need for static dashboards and enabling more interactive, conversational data analysis.

2

Incorporate AI into data tools to enhance productivity by automating code generation, allowing data professionals to focus on more complex tasks and improving overall efficiency.

3

Shift from traditional dashboards to more dynamic, AI-driven insights that integrate seamlessly into existing workflows, providing real-time, actionable data directly within the tools users already utilize.

Links From The Show

Transcript

Richie Cotton: Hi Garrett. Thank you for joining me on the show.

Gerrit Kazmaier: Hey, Richie, great to be here.

Richie Cotton: Excellent. So, just to begin with I want to talk about why you might want to use AI in data tools. So why might you want a AI features inside your data tools?

Gerrit Kazmaier: That's a long list of reasons, actually. You know, One is as much as AI, helps in many other functions, it's a real productivity gain. Code generation and coding is very helpful, for writing the many lines of boilerplate code, remembering all the syntax or design patterns, that you might not have handy right now and, you know, in the data space.

That equally applies, you know, there are data professionals writing a ton of code in SQL and Python, you name it and what framework and having the ability to get high quality generation of larger pieces of code is immensely useful. And I, and I think that's, that's, that's one big category, right?

And we can translate that actually to many tasks in the data space. It's always the same idea, right? Thank you. You make people more productive slash help them do their jobs better. And that's kind of the idea of technology all the way along, right? Technology is meant, to Facilitate, a certain problem solving, a certain task, and, and, and AI is, if you will, one way of doing that, right?

If that's, data governance, data exploration, whatever it may be. On the data side, though, there is also another angle to it, which I thin... See more

k is the, is the more interesting one. And that's as it turns out, humans are really bad in reasoning about data. So, on the one side, there is.

great books like, thinking fast and thinking slow about all of the biases that that each of us carry that, kind of makes us want to look for the data we want to find, you know, rather for, you know, looking across all of the patterns. that's one piece. But the other piece is also a There's a genuine limitation in our ability to understand complex pattern and reason about that, if you doubt me, right, you know, then I don't know, you know, why is chess hard, right?

Why is it hard to four digit numbers, because, our brains is not made for that. And this is where AI technology is massively helpful because it kind of takes all of the best practices of good data analysis, right? How do I reason about data? How do I look, for data?

How do I present that and makes that broadly accessible? So, you know, it's on the one side for professionals, a great productivity gain, right? But on the other side, it genuinely helps us working better with data and using it, all sorts of processes, decisions, products, ideas to make them truly data driven versus data validated.

Richie Cotton: Okay. I like that. So it's a mix of doing technical things like getting AI to help you write code, but also the maybe slightly less technical stuff and more on the interpretation side of things, making sure that you're not. Having any psychological biases in your analyses. So, these tools are they purely for data teams or does it go beyond that?

 is there any miles it's going to benefit from AI and data tools?

Gerrit Kazmaier: Yeah, and it needs to go beyond that. And, a tool always needs to make sense, for the person using it, right? Yeah. I think that's really important to keep in mind and the tools for data professionals, they are different, for the, like tools for layman, right?

 Just to take a formula one steering wheel, but I couldn't use it, but you know, I, I can drive my car every day. So tools need to fit in the context and the mental model and the skill level of the person using it. And it means that with AI, we're also going to have a spectrum, right?

We're going to have on the one side technical tools or technical oriented people. If you think about the whole idea of code generation, it's really built on the idea that someone understands the output, by definition, this means that this. Family of tools is made for highly skilled technical expert.

And on the other side of the spectrum, you know, we have people who don't know code, who don't know SQL, who don't know data structures. And in today's world, basically they have two real options, right? One is. They ask someone who is, and that's, the usual cycle of, people in a business function, asking data skilled person to do something for them and explain the problem.

And that person tries to understand the problem and then tries to do the analysis and then gives them something back. That's one way of doing it, and the other way of doing it is, not doing it at all. interesting piece, though, is that a very good way of thinking about generative AI is actually how can it be used to scale scarce resources.

And, data professionals are scarce, they're not in abundance and what it allows us is what I just described that, when you have a question, that instead of, calling one of those few data professionals, you can actually talk to him, data agent.

Who's doing the same steps, right? Understanding what you're looking for doing the analysis and giving it back to you in an understandable form. And this is only built on the idea, right? That you are not leaving your space of competency, right? you're not required to switch to a technical domain.

You're not required to see your right code. You know, You interacting with the AI system like you would interact with analysis. and so that means that, will have a broad spectrum of AI in data, depending on, what the person is, what their context is and what their level of expertise is.

Richie Cotton: I like the two options. I think for many people, the idea of just not working with data at all is maybe appealing, but it's a, it's not going to have a great benefit. So, back to that first idea of maybe commercial team members interacting with data teams or it's like a pain because it's slow and there's never enough data people around to answer all your questions.

 You're suggesting that Data people can be replaced AI. I think this is maybe going to terrify a lot of our audience, the idea that your data scientists are going to be taken over AI. Is that what you're intending? 

Gerrit Kazmaier: No, I don't think so. And I actually think this is not about Replacement because, that idea is wholly, I think, built on the idea that basically today, we have one way of doing it and now it's a better way of doing it. So we've gotta replace the one way with the other.

 I actually think that, both you and I started from is the reality that most people don't have access to data and insights. So there is a lot of, if you feel unmet demand. And AI creates now for data analysts and data specialists is a whole new way of how they can enable that demand.

Because in the past, organizations have tried so many ways, built up centers of excellence, you know, built, so called, BI people and business teams, but it always was basically built on that idea, build more skill. And now what, data teams can do, they can actually say, I'm going to build, an AI agent for this, right, because, someone, basically still needs to define And also in an enterprise context, basically stand for the correctness of that, right?

So defining what data to be used, defining, what questions, you shouldn't could be asked, defining, you know, the semantics of that basically all of that context setting, that's going to be done by data teams. You know, We cannot think of that like Thank you.

full autopilot, you know, that is way too reductionist, this is, you know, like a kid would imagine it, right? Suddenly we have, this robot coming in and doing all of these things. actually, the reality is much more sophisticated, right?

We're going to have data professionals. Which today are cranking out dashboards, which today are having like this long, request line off people wanting more basically being able to create, these agent based analysis basis. Which then people can, freely interact in and do a whole bunch of things they would have, originally, you know, come back to them with, to, to ask them that.

And, have the possibility to focus them on this other category of data problems that you and I talked about, right? The ones that cannot be described in a system like this that require data professional. And in the big picture, actually give way more people access. You know, There is way more demand than what we are fulfilling today. 

think it's an amplification.

Richie Cotton: Okay, I like that. So it's basically churning out the stuff that the data scientists would never quite get around to in the priority list, or maybe reducing the scope of work for them so they can do things more effectively. Now, you mentioned that a lot of data scientists spend way too much time cranking out dashboards.

So I'd love to get into how the world of dashboarding is changing. So, yeah, how does like powerful generative AI change what you do with the dashboard?

Gerrit Kazmaier: Dashboards are great, And at the same time, they're horrible. So, one thing, dashboards used to the best way, one of the best ways to surface data information, right? And there is the whole, there's like a whole, I think, science around, how to visualize data, right?

It's even Hugh, if I remember correctly, and so many others who worked in that space. And how do we make data human comprehensible? And we have tools that, then, are smart and how they can compose widgets and and all of that. But you know, reality is, again, if you then look into into real enterprises, they have thousands, if not tens of thousands of dashboards which are unused or hardly ever used.

Many of them are, one time explorations, one time questions, that just got stale. When you look into real dashboards, are quite complex. it becomes a complex system in itself. And then, you know, you are largely contingent on. The quality of the data person of the day, to catch and to act on interesting signals or not so that's already, I think, telling us how problematic or much improvement potential there is still to have, right?

Because what you would rather want is to like I said earlier, right? You should not create a dashboard for one time exploration, right? There should be a way of how you could do this. Thank In a modality that actually people work in a earlier, I said, tools need to fit to the workflow.

Someone is in Even my job is not checking dashboards, right? If you're on email, if you're on chat, if you're on any application, information should find its way there. rather requiring you to kicking out, to a different tool and then trying to translate it back into your context.

So that's, that's one limitation that, dashboards have today that will be changed with B. I. We will see fewer dashboards which have more standardized use, if you will, and all of this free from discovery. One time exploration will be driven through new modalities, you know, like conversational interfaces.

And it will be deeply embedded into the work environment that people are actually in. And the other part of which, incredibly is limiting and dashboards, try to be smart about it because they allow you to do drill downs and filtering and all of that sort of things.

But the problem is you only see what you see, So that has two inherent challenges, right? What if the interesting stuff is hidden in big aggregates, in other dimensions, in something which is not visible to you, right? So what do you want to do? Do you want to create like a hundred charts? That's impractical.

No one can consume that. No one can hold that state. So actually you do need an assistant which is able to uh, look behind the charts, actually tell you which are the interesting ones to look at, which are the micro signals that you should be acting on. And maybe thirdly, I do think it's a really interesting as well is that dashboards are really made for humans to comprehend data, right?

That's kind of the idea, right? Visual data discovery. big question, though, is, you know, is this really the end off data proven insights? Because if you are thinking about Norris system, which is doing something in real time, basically adjusting as data signal changes, you know, acting on data streams, triggering action in a real business interaction, for instance, right?

For instance, when you call a call center, right? And you talk about, the service issues that you are experiencing, right? For the service agent, basically surfacing all of the information about your history, the product, possible ways of resolving that, right? If you want to do something like that, right?

Dashboards are not the way to go, right? You need to have something which is acting on way smaller signals at a way higher frequency and then basically becoming proactive provider of insights rather than a reactive display of data. those three areas tell us that dashboards are great.

They got us this far, but I think, you know, the true promise of data is still way more, way bigger, and they'll be driven from other modalities.

Richie Cotton: this is fascinating. And it seems like all three of those things you mentioned. So the first one was about having integration with the existing tools you're working with. The second one I think was just having insights pulled out automatically rather than you having to go and point and click and filter things or whatever.

And the third one is really around like having real time insights. So they're all about just you not having to look for data, but having it. The insights pushed to you. So I don't want to talk about this further. I guess, let's take the one about integration with existing tools. How do you envisage a dashboard being integrated with, I guess, what like email or slack or whatever your office tools are, how is that going to work?

Gerrit Kazmaier: Yeah, your question is already interesting, right? Because the question is, how do you integrate a dashboard? And probably the answer is, well, you won't, right? why would you? answer to that question is, what if, the data agent, that, that I spoke about earlier, right?

What if, you know, they have an API so they can be reached by a Slack or Google chat and that you can basically, in the context of, a group chat that you're having or in the email chain that you're having, and then you are, basically either looking for data, wondering about, data or like we said, basically getting supported, with additional data through this agent in the medium that you're working in, right?

 Why wouldn't it be? Just the chat message, just take, my professional work experience, on operating services product performance, it's much more natural for me actually to be, in the meeting that you're having in the chat group that you're having and say, Hey, what did this actually look like, last month or, how is this actually trending and, actually have done, a specific factual answer for the given circumstance versus, Okay, here is a link to a dashboard, click the link, okay, find all of the right filters, okay, setting all of the filters, that takes already way too long, for me to actually do this, unless it's really, really important, right?

And it makes it worth all of that effort. and so when I say meeting where meeting the people that they are is bringing data and insights through new interfaces, new APIs, these APIs that are able to take questions, hold contacts, generate visualizations, and provide them back, in the work medium that, that you are actually using.

Richie Cotton: Okay, that sounds very cool. I like the idea. It's like, well, I suppose, of course, the answer to how do you integrate stuff is going to be APIs, but that's interesting that you're almost doing away with the dashboard, then it's just like, These are specific insights that you asked for, and then that's going to be put in the place where you're working.

Gerrit Kazmaier: Yeah. And even have some standardized, dashboards that have like information, which is. required to be frequently looked at. There is a whole space of regulation and compliance. There is a whole space of enterprise standardization, right? Yet, in businesses like in operating a service, you want that everyone is up to date, on a certain set of, KPIs, every day.

But the majority of dashboards today out there, they're not like that, these are these, one time discoveries, few time discoveries, project, I had a question, and that creates, that proliferation of that, like thousands, tens of thousands of, dashboards that are sitting stale.

I think the big opportunity space is actually transforming them into something which is. Actually a part of someone's workflow and also doing away with, dashboard as the vehicle delivering it.

Richie Cotton: I can certainly see how, dashboards is something useful if something like, hey, this is what our current company cash flow looks like. Management is going to want to look at that, like most days, I think. But then, exploratory data analysis, then, or for a simple one off analysis, then dashboards are going to disappear for that, you think.

Gerrit Kazmaier: Yeah. And, the interesting part, which is, even when you see information, you know, what does your mind, you know, usually do? Why? Why is that? Right. is it like this and why is it not like that? And then you have these questions, you know, why, why, why, or, you know, what if, or.

how would it look like, in that configuration, like the cash flow, right? Why is it trending like that? What's that? What has changed? And the opportunity is to be actually, literally, you just, type, why is it like that? What has changed?

And, you get, an answer, this is, this is what's different. This is what has changed. it sounds so natural, right? You know, This is why we have, As Kitsch, I don't know, we watch Knight Rider and, you know, all of these shows, Star Trek right there, people always had that intuition.

It would be so great if I could just talk to a computer, it would understand what I want, it has all the right context, and it just can help me with progressing with my task, right? Because they're not an end in itself, they're means to an end. And the best way of how we could facilitate that so far was, dashboards and long change requests with analysts.

And now AI is giving us this whole new capability set, right? We actually, can know people. You just write in chat, you know, why, why, why, why? And, going to get all of these insights that are lurking in your data that were unearthed before.

Richie Cotton: Okay. Yeah. And I mean, Knight Rider was around in the 80s. Like, why don't we have kit yet?

Gerrit Kazmaier: Well, maybe we will, you know, maybe.

Richie Cotton: Yeah, maybe we will. Okay. So, I'm curious as to these changes having an effect on things that, well, I mean, obviously a Google, you have Looker. So is Looker changing to accommodate sort of new ways of thinking about dashboards?

Gerrit Kazmaier: Yes, this is the main area of our work, actually. So, basically, our work in Looker exactly falls into the same two categories that we started. On the one side, we're introducing Jenny, I, and Looker to help the data professional, working in Looker. We help them with creating content, writing LookML code.

Helping to reasoning about the system, helping them to generate slides in all of the typical tasks or, jobs, that data professionals have on a daily basis. And we think about ways, Hey, how can we use now AI as another. Way of making that task better, no different from how technology develops, it's all about making it simpler, making it faster, making more accessible.

So people, can get their jobs done. And on the other side, on the looker side, which, is the part that I was speaking about last is How can we actually make it so that Looker allows you to basically talk to your enterprise data? a good test question is in any audience, how many of them use Google?

And, you know, usually everyone does. So everyone knows how to use it, how to use a search engine. That's what I mean, right? It's so natural, that, from children to grandparents, no issue in Googling. Every person I know, in a professional context knows how to use a search engine.

When you ask the same question about a BI tool, get a very different answer, right? This is very hard and, oh my God, I have to understand data. And I remember distinctly, early in my career, we did a a survey about why people don't use more data. And, one of the biggest points was.

Yeah, genuinely insecure. It seems so technical. This is hard stuff. Do I get it wrong? There's so many hurdles in the way. And, so what if, we can make the eye as easy as searching a search engine, why wouldn't it be right? why shouldn't it be? And the interesting pieces this is technically very, very hard.

So this is, it sounds so simple, but it's incredibly tough to Problem to solve for. And our approach, Google's approach is that we, take a set of steps, to solve the problem because the key with the is once you give someone that tool, it needs to be 100 percent correct, there is no space for, yeah, I might get it right 80 percent of the time, because how could you trust it.

In that, to take any sort of decision and the whole point of data is that you help people, you know, reasoning come to conclusions and come to decisions. So there is a whole space of building specialized AI models. You know, We're specializing at the task. There's a whole space of understanding a data system and how it works and basically building a knowledge graph of data to ground it in the factuality of an enterprise context.

And then there is, because you asked for Looker specifically, there is like the last element is giving data strong semantics way richer than the semantics that you would have like in a database schema. way more sophisticated than that. And it's actually semantics for an AI model to reason with.

And then exposing them this, semantic model with the right set of APIs. So multiple agents can start to operate on that. And those are the two threats we have in Looker, right? So one is help the data professionals, in their jobs, and giving them this new way of exposing data to virtually anyone in their companies freely explore it by using their own language, their native tools, And stay in their workflows.

Richie Cotton: Okay. So that's the thing with the two different audiences then, because like, Maybe you're a data analyst, you just want to build your dashboards faster, at least in the future, and then for everyone else in the organization, you want to be able to understand the data and sort of consume more of it more easily.

So I guess the second one is supposed the bigger audience then and you mentioned the idea that a lot of people just feel insecure about working with data, about understanding with it. Can you have like a purely technical solution to that or the other things that organizations need to do to help people?

Feel more confident with data.

Gerrit Kazmaier: as always, right? the cultural change always is bigger than the technology change. There's one thing that helps us, like we, joked about, kid and star Trek, right? I think there is a human intuition that this is, how it should work.

So, yeah. Yeah. Yeah. Everyone knows that idea, and it comes natural. So I think, in general, people are ready and prepared for it, to interact with a system like this. And the second piece is that I, think it's always the same, right? Technology develops, from the primitive over the complex to the simple.

And I think that's like Antoine de Saint Exupery, I think, who said that, originally. I think this is exactly what we see today, right? Data used to be primitive. It used to be expert systems handled by a few people and very limited. We are now in the in the complexity stage where you have like a million tools.

Dashboards, that was, you know, the the big contradiction of self service BI, everyone should self serve, but it meant, you know, everyone should be really becoming a data expert here, which, you know, of course, you know, is not viable. And now we have generated the, I, you know, we approached a simple stage where, actually people can experience it and through the experience with it, they will.

See the benefits and use it and that will develop, that, fast adoption cycle that we have seen also with Jenny, I all along, right? That, basically through experience and through recognizing the value generates, There is a big problem that people want to solve, and if basically they can experience the solution, I don't think there is any other barrier than just, if you will, the ability for organizations now to shift their focus on how they think BI, how they think data, and the role of generative AI in it.

Richie Cotton: It's a very sort of compelling vision. But as you mentioned, the idea of self service analytics, self service BI has been around for a long time. We still don't quite get there. So do you have any sense of like how you would go about trying to get closer to self service BI across your organization?

Gerrit Kazmaier: Yes, Looker, we are providing these, new agentic experiences and APIs. And their whole point is actually that now our data professional, they can, build a data agent to basically, in their stat, handle a task, for a set of problems. And, even my experience, right?

And I, I am fortunate, right? We have an analyst team that I can ask questions. But, I ask a lot of questions because I'm very curious, right? I always want to know more. So what, they have done is, basically they built, that agent for me and, now their part was, is making sure it does the job, it answers the question what they, want me or wanted to answer.

They gave it all of the guardrails and now I can interact with the system. And the great thing is, I can do it all day long. I can ask a million times. Why not ashamed? I didn't get that, can like that. Maybe can you show it to me like that? And that's such a big unlock. like you said, the job of getting insights is as old as the first decision systems, right?

Maybe as old as business or maybe as old as humankind. I don't know the desire to reason better, to understand more. that persists. And I think now the opportunity is that, actually now many people can that, beforehand didn't have an analyst or Didn't want to send them like a million, not a million, but you know, like 10 questions, you know, rapid fire one after the other.

And organizations who also, you know, had to basically not the ability to invest in a data function like that before, just a few analysts, right, which really want to solve complex problems, are basically drowning in business requests to support dashboard. And, this is the demand that I think is A exists that is unfulfilled by self serve BI.

And we now have a technical solution that actually can. meet the job that people actually want to do in a much better way.

Richie Cotton: Okay. Yeah. So, just cutting out that sort of middleman of the data for at least some of the questions, maybe the easier questions and then you can gradually build out that sort of broader BI self service capability. Okay, as well as talking about dashboards and BI, I'd also like to talk about the more technical side of things.

So, particularly with databases, because that's obviously a huge part of any data function within any organization. And SQL code generation is becoming popular in a lot of databases. So can you tell me how this works within the Google databases? And I know you've got a lot of databases.

Gerrit Kazmaier: SQL code generation is extraordinarily hard. and one reason is that, you're not just generating code, you're also mapping to a, to a schema, right? And also, there's like two pieces to the puzzle, if you will. And, schemas are, Usually proprietary, they don't exist, the training data off large language models.

So there are some really interesting challenges in the generation of secret code itself. Also data systems, tend to be very big, right? So, when you talk about context, it's much more complex than, the B I problem, if you will. But on the other hand, there's also a big opportunity, and this is, once you're talking, in the database context, you're actually working with a data professional who understands code, basically persona does change.

In the BI part, the problem challenge really is finding the right modality, for a business user. In the space that we are now in, right, we have a technical savvy person who can understand the output of, cogeneration, for instance, and the challenge much more becomes in how do I actually, make this productive and useful in a, like, in a big theory, tens of thousands of tables, a single instance or single, instance, and single, customer.

How do I actually make it work in that context? And the answer to this is there is a whole new category of how we build data systems. For instance, one part, that we all know what makes generated AI models good is having the right context. So word I mentioned earlier was it in the knowledge graph of data, right?

You actually have to understand what does the data landscape looks like? what is a person looking at? What is their permission? What are other peoples with that permission looking at? What has been looked at, by people like them in the same period of time?

What has been used recently? What has changed recently, right? So basically, you suddenly You need to think about your metadata system much more differently. And you basically consider it, all your metadata becomes one huge search index, that you're basically clearing for context all the time when a data professional It's generating code.

It's actually there a whole lot of the art, actually resides and it's also, you know, why, when you look at all of these big benchmarks, like, birth sequel or many public benchmarks of generating sequel code, you basically see it, tapping out like 60 70 percent of accuracy, something like that, right?

 and the key is on a big part of the key is actually setting the right context without someone explicitly specifying the context just by the use and the metadata and the knowledge of the system that you have. So that's one big bucket. And the other big bucket is, said, earlier, it's in a reductionist to think, everything is on autopilot.

It's also reductionist, to think there is just one model instance, doing all of the jobs, right? So, you know, what, we are working on in the Google side is actually building task specific models to understand domains really well. And within that domain, you actually know, right, what is the relevant context that you're working with the model, for instance, Or building data pipelines, wrangling data is different from analyzing data and the model for basically doing drill downs and finding for interesting sub theories is yet another model again.

Right? So there is also a whole part of actually engineering the system. So you have A multitude of models who are then plugging into the data scientist or data engineer or SQL analyst tool chain. And thirdly, it's also about changing the user experience. So what today, you know, most data systems look like is, you know, one SQL console, right?

So you, you know, that's, you know, probably, you know, what all people have grown up with, what I grew up with is left side, some sort of, Dataset Explorer or Schema Explorer, right side, you know, one, bright or black, you know, window writing SQL code, or maybe an Emacs OVI, but same idea.

That is, you know, inherently difficult, right? Because on the one side, all of the information about context and state is in a different pane, and then where we write our code, We have basically nothing that now a Gen AI model can work with or reason about and just, decode, rewrite, or generate.

So one of the big changes that we are introducing on our side is actually thinking through how would a user experience look like if it would be designed AI first. And, you know, one of the experiences we, I think we've just created, just last week actually, is Data Canvas. And Data Canvas is now basically saying that, well, if you have an AI agent, You would actually start the flow and the creation process different, right?

You would more work in data frames, and you would actually not only work in data frames, it would work in association of data frames. And if you do that, right, the model understands the structure and the relationship of your, train of thought, it's not just, you know, reliant on the schema information that you have broadly available in the database catalog.

And, you would not interface with it just by a jet site panel. You would basically have it continuously working with you basically and say every single data frame that you build. And start, to give you more recommendations, more generations based on the the craft that you're building up yourself essentially, right?

 so in summary, those three things, right? Bonus. Understanding a knowledge graph of data and building it. The second one is that understanding of having a family of models that are highly specialized to basically plug into the data professional tasks. And thirdly, rethinking also the user interfaces and the experiences the data professionals work in to make them designed for high quality AI generation.

Because you can help AI, we all know this, right? You know what, so funny, right? Everything is in the prompt, right? So, like in the good old days, the secret is asking the right question. You're a podcaster, you know that. And it turns out that, we can of course make it the user's problem of writing the perfect prompt, right?

Use this table and then do that and then, you know, put it from here. And, you know, basically we make the prompt like an essay. Or um, you actually build experiences that build up the problem continuously and retrieve the right context continuously as you're working and basically make it part of the tool.

and that makes the data professional AI experience really work at the end.

Richie Cotton: Okay, I have to say that last point sounds especially useful because certainly At least when these sort of chat tools first came out, it was like, okay, you generate me some SQL and you're basically writing almost exactly what the SQL should be like, well, take these particular columns from this table and say, well, you've not actually done much.

I mean, I might as well just write the SQL at that point. So yeah, having the automation done so you can really just stick to simpler language and not have to write an essay in your prompt does sound incredibly useful. I'm curious, because you mentioned context is the important thing to getting this right.

Do organizations need to provide some of this context? Like, what do you need to do so the AI understands about your data?

Gerrit Kazmaier: You're cutting, I think, to the, to the core of maybe not used to the core of the opportunity, rather. And I actually think you can't think of your AI system and your data system differently. Because what you have just described, Richie, I think, you know, your experience with using SQL code generation basically shows the key constraint of a large language model is, you know, it's not, probabilistic, is no factual knowledge, right? It basically operates on token by token probabilities. And then, it converges, it regresses, it's auto regressive towards a final output, So if it doesn't know your schema, right, you know, if there is no probability of using the right columns, because it didn't exist in the training set.

It won't know, If you're using syntax or, stored procedure functions or anything like that, right? Or libraries, not in SQL, but in Python. it won't know that. So, big part is actually understanding that when we build these AI capabilities, they are closely intertwined with the data system.

For instance They need to have an ability to fact check themself against your data scheme. more than a prompt, right? Because The prompt basically says, I would like you to do this, then the model basically, probabilistically, does something, and then, at the end, actually you do have to, go through and saying, you know, are these the right columns?

If not, which are the ones that, actually would be the right ones? Is this the right function? Is there maybe a function that, you know, would, do this better? So basically, you start thinking in what I called earlier this. ensemble of agents, and they all need to work together to actually get to high quality SQL generation because at the end, like you said, right, you wanted to just get it, get the right data, get the right functions, get the right syntax.

So it actually does help you. So that is a huge part when I say context, it's actually not just, generating the right prompt, but thinking of AI as a system that closely interfaces with your data system. And I think that's, one of the key issues, actually, that, right now everyone tries to AI ify everything, obviously.

And you see that, people are trying to stitch together, right? I take my AI model from here, and I take my data system from there, and then, I just build a prompt, and it's all going to work. And the answer is no, it won't. Because the reality is much more complex than that. so a big part is understanding AI and data as as part of, a platform, working together.

And the more you can specialize this together, the higher quality the output will be. Now, the problem is, of course, if you are a vendor selling data warehouses and, you are not having any roots in AI, you see playing out a market, right? You acquire, you hire, you, you try to build that gap, but you know, it's going to take you what years, five years, seven years.

 it's just, you know, data thinking, right? So you want to choose a platform, which is data and AI first

Richie Cotton: Okay. Yeah, that's really interesting. Cause I think a lot of people think, well, AI is getting smarter. We're going to have like one super powerful AI that does everything, but actually the reality is you end up with having lots of smaller AIs working together mentioned the term ensemble.

So it's lots of different smaller AIs playing together. So the SQL examples, like one thing sort of figuring out, like, how do I write the SQL and other ones, like checking does this match the schema and things like that. So it's, it's more narrowly focused.

Gerrit Kazmaier: tricky territory because, and you look at any of the large Benchmarks, large models beat small models, even specialized ones. So the, professor model, if you will, right, Gemini, large models, they tend, to do well across all sorts of tasks, even better in tasks than, smaller specialized models.

So when I say a sample of models, I don't necessarily mean, not using big models and many small models instead. What I mean is, when we think about a model acting like an agent, basically we want to give them focus on a specific task, and this is, the craft of AI, if you will.

So for instance, SQL generation works better if you ask the model twice, if you ask it once. What is the logical plan? Think about a logical plan. once you have the logical plan, translate it into SQL. Basically, we ask the same model to play two roles. You know, one is do the abstract reasoning, and then do the code translation.

And then I think, it's natural, like divide and conquer, that principle in computer science, right, you know, decompose a problem into smaller problems. And that's what we're seeing here as well, right? We are basically building an AI system. There are multiple agents collaborate with each other in this ensemble, but I think still it requires it's an ensemble of large models, right?

We are not at a state where we have any evidence that super small, highly specialized models can outperform large models.

Richie Cotton: Okay, so, it's more test deconstruction than necessary, like picking lots of different model, lots of different types of A things.

Gerrit Kazmaier: And maybe, you know, fine tune them, specifically to a task and then align them to a specific task. But, um, yeah, I certainly don't believe that, you can just, you know, There's still a bunch of small models and can expect the same performance as you get from large, powerful models, like the Gemini model family and we are specializing using a Gemini model family specifically and basically building an ensemble with it.

Richie Cotton: So you mentioned that you shouldn't think of what you're doing with your data platform and your AI platform separately, everything needs to be integrated, but I think the reality is many businesses at this point, they're using dozens, maybe hundreds of different like data and AI tools and there's more and more tools come out all the time.

Is there a secret to getting all the different tools you're working with play nicely together?

Gerrit Kazmaier: I actually think, having many tools is fine. Practical example is, you know, in BigQuery. People use a large set of tools against it, in one customer even, right? And it works because they're all sharing the same data. the tool is more of a form factor of how does someone want to consume?

It doesn't change the meaning, semantics, any of that, right? It doesn't, um, if you will, basically that proliferation is basically making it more consumable. It's not really a problem by itself. It may be a cost concern, and people would like to just buy one license, eventually. But it is not, I think, the big issue of standardization.

I think the standardization issue comes in when you use multiple platforms. I think there is a big distinction between platform and tool. The same basically applies with models, right? there an issue with using many models? No, definitely not. Does it mean you should use multiple platforms?

Probably not, right? not a great idea. One of the reasons, why in Vertex AI, we have that model garden, which gives you access to over a hundred model plus hugging faces because depending on the use case, depending on the scenario, you will need to choose, right? And you have different cost considerations, latency considerations, whatever it may be, right?

But the platform aspects of governing, building, lifecycle, deploying, models. This is basically what a platform guarantees. The data space, it would be security, correctness semantics lineage, all of these aspects. It will also be part of one government platform. You know, In the data space, there was this terrific, phase when we talked about data meshes.

And, you know, everyone loved that idea of data mesh, right? Because again, it sounds nice. So promising and what though was a bit unfortunate. Many people then basically took all of the issues which they had, all of their, many data platforms, which duplicated data, inconsistent governance and so forth.

And just it's great. We just call it a mesh. We're done here, right? we just, modernized our data architecture. But data mesh, specifically was specific about one point that that shared infrastructure component on which all of these data products are then being built and what I say platform is that shared data foundation and I think the key to being successful with data is understanding the essence, the very essence of data driven value generation, which is very simply put, the more data signals you can bring together, the more you find hidden or latent, unintuitive patterns in your data, the more you can build services or models using the data, the more customer value or, customer services you can create, which generate more data for you, for which you gather more data signals, allowing you to build better services and models.

Thank you generating more data. That's the, that's the flywheel of data, right? That's the basic idea. And that is built on the premise that you can build these not big, but very wide data records, bringing many data signals together. Because this is actually the interesting patterns that, you want to leverage in data are being exposed.

 They're not in the data slices and silos that you have in the application. You know, For instance, in, in, in In business system, right? You want to know your HR data and your service performance, coming together, right? Or you want that your supply chain and your front order delivery is coming together, right?

You want all of this together. And that tells you that by definition, you need to have to share data foundation. This is The very foundation on which data driven value generation is being built on at scale, and the key, you know, what, history of Google was, right? Google basically is a data driven business, a purist definition, maybe on the core of the definition. I looking at the data technology market and saying, Hey, we want to build this data system that basically gives us this property of building these wide data records that allow us to harness all of the data signal and couldn't find any solution in the technology markets that Google built its own.

And, as it so happened, now we call it BigQuery. But, BigQuery, Beforehand Tremor, and the idea of this, query engine and the storage model behind it that can be used to make these immensely big data sets and analyze them is basically what is designed to build that common data foundation.

And my point would be, if you build that common data foundation, don't spend the tools as much. And it's maybe something you want to optimize. But it's not the big leverage. The big leverage truly is and having that wide data record foundation that allows to capture all data signals and bring them in a common platform that you can associate them with each other.

So you do see these latent patterns, meaning the hidden patterns. that your data already has, but you only will see, right, if you bring it together. This is why I also think, the big data age is finally coming to an end because it's not about big data. It's about wide data. I think it's a way better way of thinking about it.

Richie Cotton: A lot covered there, but it sounds like particularly for the lower level, the data infrastructure, you want that to be common across your whole organization, because you don't want different teams to have to worry about that. Data security or things like that. It should just be standard throughout the company, but then I guess as you get closer to the business use cases, that's where you want a bit more tailoring so you can really focus on, well, this is how we add business value with our models and data products and things like that.

Gerrit Kazmaier: Yeah, and you want, you want this data foundation to grow, right? If someone brings in your data, you don't want this to be an island of its own, right? You want to associate with all of your other data. when you have the ability, right, you know, your customer data, your sales data, your service data, The social media data, all of that stuff, and you see that wide data record, That is the true definition of a customer 360, right? It's not, doing a CRM query and knowing what they have ordered. it's basically seeing all of that in conjunction, and this allows, the creation of new and exciting services, for instance, or making processes much more efficient because you find now these relationships that you couldn't see before.

that emergent, you know, record of making it ever growing and bigger, I call it the flywheel, right, of data. That only works if you basically build that common foundation. If basically everyone builds their own, you have many small flywheels, but you don't get that momentum and compound effect of what your enterprise data actually has.

 and organizations need to bring it together to unlock it.

Richie Cotton: Absolutely. I'm sure in every single organization, like basically no one knows the extent of like what all the data is there. I mean, I know we have vast amounts of data at Datacamp. I know, I know quite a lot of it, but I've, I'm sure there are huge pockets of data I have no idea about. And so, yeah, having access to those is going to give you ideas for things you can build that you wouldn't understand otherwise.

So I think it feels like this goes beyond tooling though, right? Because you can buy these shiny new tools, but. Getting to business success, there's another step there. Do you want to talk me through what that looks like?

Gerrit Kazmaier: Yeah, business success I think there are three very concrete ways. And as I said, right, data is no end in itself, right? So you want to use data to make existing processes more efficient. and that is something that business is really good at naturally, How one space is in your deficiency site, right?

How much better am I in managing business metrics like service requests and as a low day sales outstanding and data? It's a huge accelerant to that, right? Giving either people the right information to complete their tasks. Or creating new self service offerings, right? That's a huge space, And how many service center calls can a company prevent by, for instance, giving them, a chat agent that is able to understand, you know, what they're trying to do and help them all along. So that's the efficiency space is one big one. I think, a more interesting one is in the, let's call it in the new feature, new capability space.

You know, How can you use your data to augment your existing products? You know, For instance, in hype and I'm wearing a garment watch here, no, no advertising. The reason why I'm wearing this watch is because of all of the value at, the data signals that are getting me and Carmen basically convinced me from saying, Hey, this is not a great looking watch.

I'm not sure if I want to wear it to saying that I want to wear this every day because it basically tells me so much about, you know, my well being, my sleep, how it's trending, if I, should sleep more, if I should work out. And these are all, new services. So when you think about, your favorite streaming platform, right, the quality of the recommendations, for instance, or the ability to take usage data and create new content.

This is a way of using data to build new products, and that should result in engagement and revenue. These are really hard, intangible business metrics. And I think there is even a third space of, what you can call new companies. You know, How many companies are built on their ability to understand and match data?

So many come to mind, right? From ride sharing, to hospitality. So many great examples. That's really a data platform, basically creates the product in itself. So I think there are some real, hard business metrics that any company can associate back to their data initiatives.

And then there's the fourth one. It's a bit of a harder to quantify one, because how do you quantify the ability of people to make fast and confident decisions? Non trivial to measure, but ultimately, right, what Google search is doing, It's helping, I don't know, billions of people getting to the answers which they want.

And it's so quick, imagine not having it. Would create a huge problem, for me, if I'm looking for something to buy or looking for something to read and and the same applies, you know, if we do B I well to how people work every day, right? And it's not, as easily captured as in this one business metric because it will make workflow faster, it will make, customers more happy and employees more engaged and feel more productive.

But that's much more of an ambient quality. And I think it is, An equally important one, though harder to measure.

Richie Cotton: Absolutely. Again, just perhaps that last one's the biggest one. Just making people make decisions faster. That's gonna, probably gonna add vast amounts of value in there for, any organization and just all the individuals within it. So, We talked about dashboards, we talked about databases, we talked about getting value from your data.

Are there any other data trends that you're most excited about at the moment?

Gerrit Kazmaier: Oh, in a Ritchie, it's really the most exciting time in data right now with the opportunity space that General Redefined AI creates. There is one subtrend, if you will, that I am mostly excited about. It's interesting. We basically, we have two big gaps, gap we talked about, right?

People have not the access to information that they could have. And, talked about how this is going to change. The other side is there is so much data, which is not being used. how much knowledge and wisdom is locked away in all the documents, video, audio recordings that exist.

Unstructured data is by far the largest data type that exists. And it's the one that's hardly ever used. in any enterprise data landscape. But the reason is it's just too darn hard. it's very specialized. If you build an unstructured data processing pipeline, it's very difficult.

It's not like a table that you can query this way and that way and join it, but it should be. Why can't it be? And generated the eye gives us for the first time a way to reason about unstructured data in a fairly flexible way. I mean, I think about the opportunities that we have by just harnessing that data.

how much richer that will make any company in their ability to capitalize on data. I do think it's going to be it. Maybe the biggest step change in all of this, because the data is there, everyone has it, and now basically we, got the key, you know, the key to the safe, which is called generative AI, and I think it will unlock a whole set of new use cases and improvements that basically everyone can tap into now.

That really excites me.

Richie Cotton: That's a very good point that there is a vast amount of just really useful information just stored away in documents and videos and things that no one really has access to. I'm sure there are many Google Docs in our organization that it's like, I'm sure it says something useful, but I don't know where it is or or what it says.

So yeah, being able to access that and, get some value out of it sounds incredibly important. All right. Super. Thank you so much, Garrett, for your time.

Gerrit Kazmaier: Thank you, Richie. It was great being here today.

Topics
Related

podcast

[AI and the Modern Data Stack] How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence and lakehouse technology, how AI tools are changing data democratization, the challenges of data governance and management and how Databricks can help, the changing jobs in data and AI, and much more.

Richie Cotton

52 min

podcast

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.
Richie Cotton's photo

Richie Cotton

45 min

podcast

Data & AI Trends in 2024, with Tom Tunguz, General Partner at Theory Ventures

Richie and Tom explore trends in generative AI, the impact of AI on professional fields, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, the challenges and opportunities surrounding AI in the corporate sector and much more.
Richie Cotton's photo

Richie Cotton

38 min

podcast

Data Trends & Predictions 2024 with DataCamp's CEO & COO, Jo Cornelissen & Martijn Theuwissen

Richie, Jo and Martijn discuss generative AI's mainstream impact in 2023, trends in AI and software development, how the programming languages for data are evolving, new roles in data & AI, and their predictions for 2024.
Richie Cotton's photo

Richie Cotton

32 min

podcast

Effective Product Management for AI with Marily Nika, Gen AI Product Lead at Google Assistant

Richie and Marily explore the unique challenges of AI product management, collaboration, skills needed to succeed in AI product development, the career path to work in AI as a Product Manager, key metrics for AI products and much more.
Richie Cotton's photo

Richie Cotton

41 min

podcast

How Data and AI are Changing Data Management with Jamie Lerner, CEO, President, and Chairman at Quantum

Richie and Jamie explore AI in the movie industry, AI in sports, business and scientific research, AI ethics, infrastructure and data management, challenges of working with AI in video, excitement vs fear in AI and much more.
Richie Cotton's photo

Richie Cotton

48 min

See MoreSee More