[AI and the Modern Data Stack] Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot

Benn and Adel talk about the nature of AI-assisted analytics workflows, the potential for generative AI in assisting problem-solving, how Benn imagines analytics workflows to look in the future, and a lot more.

Feb 19, 2024

Guest

Benn Stancil

Host

Adel Nehme

Key Quotes

Once you have some new kind of foundational piece of technology a bunch of new stuff happens with it that you wouldn't really have been able to imagine even existing before. It's very hard to explain to somone who was pre iPhone what people are doing with TikTok, that stuf doesn't really even make sense until you imagine 'oh, you have these phones, you can like do all these sorts of crazy things with them, you have all these sort of location services built into them, there's all these other capabilities. You have to kind of have the phone first before you can even imagine a thing like TikTok or Instagram. AI is to some extent, whatever happens to it, it's going to be, people see it and they're like, 'Oh, great. This is a chatbot. Okay, we're gonna go build a bunch of chatbots.' That's obviously a first thing when you first build a phone, a bunch of people build sort of the easy extension of what the world was like before. But, but once you start to actually like play with it, you realize, 'Oh, this thing can do all these sort of unexpected and weird things or have our mind opened up to a bunch of ideas that we wouldn't have really thought of, until we got to this point.

The job of an analyst working with AI in the future will be, your AI tool is going to give you a thousand things that all sound pretty convincing. Which ones are real? What actually is truthful in this? If I ask an LLM give me a very convincing analysis on why drinking a glass of wine every day is good for you. It'll give me something very convincing. And if I say, give me a very convincing argument on why drinking a glass of wine every day is bad for you. It'll give me a very convincing argument. What do I do? My job is to read these things and actually choose, this one seems better. And that's pretty hard to do. And so, I think actually a lot of the job could be, we have a thing that will tell you whatever you want, it'll give you a convincing argument for any point you wanna make. My job is now try to figure out which one's better. It's to moderate the debate.

Key Takeaways

The ability to effectively communicate with and guide AI tools is becoming a critical skill for data professionals, similar to the shift seen with SEO skills after the introduction of search engines and the need to develop effective search strategies.

Leverage AI to unlock valuable insights from unstructured data, such as customer feedback, enabling organizations to make more informed decisions based on a deeper understanding of consumer behavior and preferences.

Data professionals should use generative AI to help them iterate and improve on their queries, rather than using gen AI to form queries entirely. Data practitioners will always have a much better idea of the business context and what’s needed to get the desired results.

Links From The Show

Mode Analytics

ThoughtSpot acquires Mode: Empowering data teams to bring Generative AI to BI

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

[Skill Track] SQL Fundamentals

The Future of Marketing Analytics with Cory Munchbach, CEO at BlueConic

Transcript

Adel Nehme: Benn Stancil, it's great to have you on the show.

Benn Stancil: Good to be here. Thanks for having me.

Adel Nehme: Thanks for coming on. So you are the co founder and chief technology officer of Mode, which was acquired by ThoughtSpot this year. You also have an incredible sub stack that I highly recommend listeners subscribe to. A lot of thought provoking takes on the data space. We also have a bit of a shared history that we both tried our hand at political science. You were a fellow at the Carnegie Endowment for International Peace and DC, and I was a research assistant at the Carnegie Middle East Center in Beirut. So maybe before we deep dive into the meat of today's chat, Walk us through the journey from starting at Carnegie to ending up founding mode and getting into the data space.

Benn Stancil: So I graduated from college in 2009 which is relevant because nobody hired for any jobs. Like 2009 was when everybody was getting fired. People were getting offers and then getting them revoked before they got there. And so like the one place that was hiring people was DC. This was basically the only place you could find jobs.

So, my background was in economics. And so ended up working at Carnegie. Carnegie was like a think tank for those who aren't familiar. Think tanks are basically like a half academic, half policy organization where you do research. But it's aimed at being like, how do you affect policy and, more specific ways. It's not academic research as much as it's saying, He... See more

y, we should enact this policy or this is what we think of this bill or whatever. And so I ended up there partly out of like, it was related to what I was doing, but it was also, the job you could find in, in 2009. And so it's like very interesting work. And that you're thinking about big problems. The way that you do the analysis is interesting. I was basically like looking at economic data and trying to make recommendations about what various policymakers should do. However nobody listens to you like you are. And in 20 something year old writing like, policy briefs that are like then sent to Congress and like Congress does not care. You are eight steps away from anybody doing anything with this thing. And so it's like academically interesting, but not something it felt like this is, I'm just, I'm, you know, yelling into the void here.

And so I ended up as you are familiar, think tanks are dead end jobs in some ways. Typically are there stepping stones to grad school or some other like government job. And then you come back to the think tank when you're on the cusp of retirement. And so I started looking for other things.

I had a couple of friends who had moved out to San Francisco like the jobs they had there and then found the job at a company called Yammer. It was just like, it was like a data analyst job. It was, you know, look at data, make recommendations for the company in some ways, very similar to what the think tank job was, which was like.

Here's an ambiguous problem. Go do a bunch of analysis, make a recommendation, except instead of the recommendation being to Ben Bernanke, the recommendation was to a PM that you sat next to. So less impactful in the sense that you weren't trying to, you know, look at policies that could change the world.

But the person who you're talking to was like, would listen to you and, and you were. You weren't like a free consultant that they never asked for, that just decided to make recommendations. You were someone who was there to like, try to help them. And so same structure of job much more felt like you were actually doing something that somebody cared about.

Adel Nehme: Yeah, I can attest to that. It's definitely intellectually very stimulating, but the impact is quite limited in the think tank space. And maybe following up after your experience in Yammer, maybe give us the why behind founding Mode. What was the problem you were trying to solve for data teams when you first had the idea for mode?

Benn Stancil: so I joined this team at Yammer. We were. The sort of data team that our job was not to build dashboards and just reports. But we also weren't the kind of like data science, do hard math in a corner and come out with a model that predicts the world type of stuff. our job was basically the marketing team would come to us and be like, we're holding another event.

Do we hold it in San Francisco or New York? Like Help us figure out the right place to do it. And you do some stuff we have customers here. Prospects here. This is where we think we should do it. Yeah. Product team would be trying to make a decision about it. We ship feature A or B or those sorts of things.

And so we essentially needed a tool that was technical enough for us as a data team to do a bunch of exploratory analysis that we were working with data and Vertica at this point. So it was like a database, writing a bunch of SQL queries, doing that kind of stuff. But we also had to like collaborate very closely with the people who are next to us and they weren't technical.

And so we just needed to be able to share like reports and stuff for them to be able to do their work as well. And so we ended up building this internal tool inside of Yammer that was basically like a SQL query editor and a browser where you could run queries and then send them off. Via URL to other people who could then just look at charts of the results.

And it ended up being like this really impactful tool inside of Yammer and then inside of actually Microsoft after, after the company got bought because it allowed us to work as like the technical data team in the way that we wanted to work. But we can very easily like collaborate with the folks who we're working with because we weren't working in like a Jupiter or an RStudio or something that we couldn't send to them.

We were working on a thing that we could very easily like, you know, there was a back and forth there that was easy where both of us kind of had a view of the data and the results that like fit kind of our, technical abilities. And so actually after we, we built this, we started to realize that one, this was an internal tool that was pretty successful inside of Yammer.

It was successful inside of Microsoft. A bunch of other companies around Silicon Valley have actually built similar things for their, their data team. So Facebook, Airbnb Spotify, Pinterest, Uber all had these kind of like query tools in a browser type of thing. And so essentially me and the other two folks started it.

We're kind of like, Hey, if everybody's building this thing, everybody thinks it's been successful inside of a company like Microsoft, like maybe there's a market for it where somebody should just build it. And, turn it into a product where everybody can have and so that was basically the impetus of the idea was the success of this internal tool that we had built.

and thinking that, as databases become cheaper, as data teams become more of a thing, that this is the type of way that they're going to interact with, with their kind of business stakeholders. And so can we build a tool that, that leans into that kind of workflow? I

Adel Nehme: That's great. And in a lot of ways you know, maybe this is a bit of wide question. You know, your experience at Yammer, which was, at the time quite an advanced organization from a data perspective for the time. led to the founding of a tool like Mode. How have you seen maybe the analytics space evolve since that time?

What do you think are the main highlights of evolution that you've seen in space?

Benn Stancil: mean, there's a lot. When we started Mode one of the biggest objections that we ran into was that it was a cloud tool, like it was a SaaS product. And there were a lot of people who were like, well, data, SaaS stuff. I don't know that that makes any sense. Like we're not going to buy that.

This was one of the hardest things we had to sell around in the very early days was just general nervousness about the cloud. Specifically as it relates to data, like people, Salesforce was obviously like already a very big thing at that point. Cloud software was a thing, but like data in the cloud was kind of like, I don't know about that.

That changed a lot. Redshift was probably the first big Hey, this is a thing that might actually work. Snowflake, I think was the, and this was much later, but like the Snowflake IPO was somewhat of a big moment in this because it was like a bunch of people who, you know, the people who pay attention to things with very big dollar signs in front of them started to realize like, Oh, cloud data is like clearly potentially a very, very big business that changed, but it changed gradually over time.

The other thing that, that also was very different is. When we first started pitching mode, one of the biggest objections we got from like VCs and VCs don't, not that VCs know everything, but like VCs are a decent reflection of like conventional wisdom of what people think about these sorts of markets was who's going to use this thing This is a tool that writes SQL.

Who writes SQL anymore? I, isn't that over? I thought we, we had Hadoop and like, why does SQL matter anymore? And there's only a handful, like these are a bunch of handful of like nerdy IT people in the back corner that do it. What, what is this all about? And I think one, the, the like Hadoop era didn't really stick.

And so, Move away from that, but two things like, the modern data stack and, a lot of the philosophies in that of it being a very SQL heavy thing of tools like dbt. And again, the popularity of something like snowflake, which is obviously a very SQL heavy, you know, database that became a much more like, Oh, there are a lot of people whose job it is, is to, to build data infrastructure that is primarily SQL based.

And so that, that the notion of this like modern data team, that there are people who are writing SQL, they're trying to answer questions. They're not just building dashboards and reports. They're not like the capital D data scientists. Their job is to sort of like figure out what an organization does with the data and make it useful.

People were skeptical of that idea, 2013, 2014. Now it's that would became much more of this is the way that this stuff gets done.

Adel Nehme: If anything, the analyst has become the bread and butter of a successful data team and today you know, in driving the data informed organization, to quote your sub stack. Now, actually we can talk about the modern data stack quite a lot, but what I want to talk to you about today is especially the potential impact of AI on the analytics workflow.

This is something I've seen you write quite a lot about in your sub stack. And I find, you know, a lot of interesting things here and, one thing that we chatted about behind the scenes when we were talking about this episode that resonated with me a lot, is that you talked about AI in the future being weird.

And I think the word weird does a better job than most other adjectives describing the potential world we're heading into. So maybe you've set the stage. What do you mean When you say the future of AI and data will be weird.

Benn Stancil: Well, I've said two things, I guess. One is. Generally to me, the way like tech, and this is not a novel thought necessarily, but the way that like technology and stuff changes, like big technological shifts, change how people do things. it's like teleportation in some ways that, we don't like sort of walk slowly towards it where we can see what's coming and it's like, Oh, we see it in the distance and we get closer to it.

It's like a big thing comes out and you sort of get teleported to a new place where you have no idea what's on the other side of that thing until you get there. Like, If you like snap your fingers like, guess what? You're going to teleport somewhere. You're like, I have no idea where I could, I could end up anywhere.

Like it's impossible to imagine the next five steps when, when you do that. And I think there's like a version of that where these big changes happen. And once you have some new kind of foundational piece of technology. A bunch of new stuff happens with it that you wouldn't really have been able to imagine even existing before.

It's very hard to explain if you, if you were pre iPhone, like what people are doing with TikTok you know, that stuff is like, doesn't really even make sense until you like imagine, oh, you have these phones, you can do all these sorts of crazy things with them, you have all these sort of location services built into them, there's all these other like capabilities, like you have to kind of have the phone first before you can even imagine like a thing like, TikTok or a thing like Instagram or whatever.

And I think AI is to some extent that we're like, okay. whatever happens to it, it's going to be the people see it and they're like, oh, great. This is a chat bot. Okay. We're going to go build a bunch of chat bots. It's like, nah, probably not. Like that's obviously a first thing, when you first build a phone, a bunch of people build sort of the easy extension of what the world was like before.

but once you start to actually play with it, you realize, oh, this thing can do all these sort of unexpected and weird things. our mind is opened up to a bunch of ideas that we wouldn't have really thought of until, until we got to this point. So generally to me, that's, that's like.

it's going to be weird because it's, it's a big foundational shift that we can't sort of just extrapolate the previous what we've been doing. It's like, it's like a discontinuity. The other thing to me about AI and LLM specifically are a very weird kind of technology.

We're like. They're very human and that a lot of the things that humans do that like, they are pretty creative they aren't very good at taking explicit directions and doing exactly what you want them to they do unexpected things, you can treat them almost like humans, where like, consistently to me, the thing that is surprising is when you want them to do something better, you do the same thing you would with a person, where you like, you just give it more information, you explain it more stuff, you You ask it for more specific results, you tell it to give it feedback and it learns.

There's a bunch of weird stuff in that. That's like a pretty weird, it's not a computer, it's a computer obviously, but it's not like a computer that thinks the way that, that we have always thought of computers thinking. And so to me, like what happens that I have no idea. There was this, it was the.

the uh, yeah, uh,

Adel Nehme: Scientist at

Benn Stancil: whatever he is currently now at OpenAI um, had a thing about like people complaining about, hallucination in, in LLMs. And he's kind of like, look, hallucination is the point. Like hallucination is, it's not hallucination. It's creativity. And that the problem is not the problem. Like he was comparing LLMs to search and he was saying, the problem with search is it has no creativity.

There is, you search for something, you're getting the same thing all the time, like it is not a creative thing, but LLMs are create like that. That is sort of the benefit of it. And I don't think we've grasped that yet. I think we've like view them as how do we make them perfect little robots that do exactly the task we want.

And it's like, ah, they're not really that they're this weird creative robot. And that's a crazy thing to think about, but what happens that? I don't know. But. We're not used to, I think, having a tool that is more creative than we are, in a lot of ways. But is not good at following directions.

Adel Nehme: Yeah, and there's a lot to unpack here. And you mentioned at the beginning is that, you compared large language models and AI to the iPhone. And that, you know, it's hard to explain TikTok pre iPhone. And I completely agree with that notion that AI We'll usher in an ecosystem of applications and tools that we didn't think possible.

And, you know, in a lot of ways, chatbots today are those first waves of apps that we've seen after the iPod Touch and the iPhone were released. Do you remember for example, the beer app? Where you could, yeah we were seeing kind of the, first wave of applications of, large language models today.

And in the future, we'll see, I think, a lot more robust, ecosystem of tools emerge. But let's ground this discussion, maybe right now, given the current talking about applications to, the capabilities of large language models and coding workflows, analytics workflows, right? You know, we've seen a lot of promise, actually, when it workflows.

Things like GitHub Copilot, the Chachapiti Data Analysis, formerly known as Code Interpreter. So maybe anchoring our conversation first in the present and present capabilities, how do you view the current capabilities of large language models and being useful in coding or data workflows? Where are they effective and where are their limits?

Benn Stancil: would say, two things about that. I think one is They are not effective at all as the wrapper around chat GPT does not work. I think that there's, you know, there are the a thousand YC startups that are like, it's your data analyst assistant. You ask it a business question and magically you get an answer.

Like what was my revenue in California last quarter? And you know, it writes a SQL query for you or whatever. This is a pretty common thing. People have built versions of these. They have basic sort of like prompt engineered chat GPT and fed it a little bit of context about your data and hope it writes a good SQL query.

the sorts of things where like, it does a good enough job to convince you that this might work, but it really doesn't. So it, it's like, Those things are, you can do it on top of these very toy data sets. You can do it on top of the schema that's got four tables and 12 columns total.

And they're all perfectly named. And you're asking it like questions that you can figure out from the schema. And it will do a thing that is like convincing enough to be like, ah, yes, this might actually work. But then when you start to use it for real, it does not work in an actual business context at all.

and especially when you're trying to get an answer, that's very, very, the CEO asked what was our revenue in California? You can't be like, well, here's our best guess. That doesn't really make sense. So I think in those cases, it is not, it is not good. You can do a bunch of work around it.

That is. if you add a, a ton of work underneath it, like you can coerce it into doing the right thing. If you prompt it with a bunch of the right stuff, if you do a bunch of work to figure out when somebody asks, what is our revenue in California, what does that question really mean? What is the scheme is that we should be feeding it?

There is a lot of underlying infrastructure there that you can do that improves that a lot. Even there though, I think it's tough. It's hard to like go from question to SQL query no matter how much context you give them. Maybe that'll get better. But like currently that's a pretty, pretty heavy lift.

The thing that I think it does a better job of is. so we have a, we're building sort of a tool inside a mode that is basically this that we're starting to a lot of some customers. The aim is much more of a co pilot style of thing where instead of asking a business question, you are basically asking for help to do the like, it is for the technical person asking for help to do the thing that they want to do, where you as a, as an analyst.

Get asked, what is our revenue in California? You roughly know the SQL query you have to write to do that But you don't want to do all the tedious work of like how exactly do I define this join? What are these like various functions in sql do I got to bucket it by something? I don't want to write the giant cake statement that does that you are essentially using mail lm to go from pseudocode to finished query As opposed to from business question to like query that it, that it does.

Cause there's a bunch of things where I was like, if you go from business question to query, it writes queries that are kind of weird. Like even if they're right. They're hard to parse. And so if you're like, I don't fully trust it. I have to go through and read it and like code review, the thing that the machine wrote without understanding the way that it did it.

And again, it does things in ways that are a little bit like off kilter from the way that an analyst would actually do it. It's a very painful thing to be like, all right, great. Here's a 50 line query that I have to figure out what this thing is trying to do and if it's right. And it's actually much more effective to be like, let me tell you roughly the query I want to write.

Where I just out a bunch of like, do this, join this, join this, join this, do it. And sort of like the way I would just yell at you if I wanted someone to write it. And then they like, it just takes all that and does the like tedious work of turning it into a query. That to me is like a much more effective way that these things get used today than, than the like business question to query.

Adel Nehme: And when you're looking at, the now we're looking at the current state of large language models for coding workflows, and I agree with a lot of the limitations that you lay out, I think. Providing that really deep context over the company's data and translating from business questions to, SQL queries is going to be much more challenging than we expect it to be.

But if you project a lot of the current capabilities into the future, how do you see these capabilities ushering in the weirdness that were described earlier? What do you think will be weird about potential analytics workflows in the future?

Benn Stancil: Two things I think. One is that even if they're not, I think, I think it'll be a long way. And this is, I don't know, these things move pretty quick. And so I, I. In the early days of COVID, I was always like, this thing will be over in two weeks. And then I'm like, it'll be over in a few months. I was always basically like a month behind what was actually happening.

And I feel the same way about this stuff. I'm like, it definitely can't do this. And then a month later, it's no, it actually did a pretty good job of that. with all of this, but I have, you know, no idea how to pick future. It seems like they're going to struggle for a while with just like writing a really good query out of the box.

And part of that is also, again, I think like a little bit of a misuse of what they're good at, where it's like, they are creative. They aren't great at taking directions exactly. Us using them to do things that are very much taking directions and not being creative is like a little bit of a misuse of the technology, but I don't think they're going to be great at it.

Like writing, you know, asking complex business questions to generating exactly the perfect query from there. The two things I do think that they will be good at that. Are the things that will start to get weird is if they get good enough, like understanding a question, I think we can start to build infrastructure that is designed to help them answer those questions.

That doesn't quite. It's like right now there's a lot of people being like, let's take the documentation that we wrote for analysts and stuff it into the context of this thing. Let's take the queries that analysts wrote and stuff into the context of this thing. It's all like taking things that were designed for today and trying to help the model understand stuff.

And it's actually, we may end up realizing that it. If we build things specifically for it, it's going to get a lot better. So part of that is, you know, this is something like, ThoughtSpot, the company that Moe got acquired by has been thinking about this stuff a lot. they have basically built an underlying semantic model that is designed for an LLM to, to use as opposed to design for a human to use that kind of concept.

To me, you could extend in a lot of different directions where it's like. The system, instead of it being all designed for humans where we stick an LLM on top, it's like, these things have some weird properties and some things that are slightly different than humans. What if we cater to those weird properties instead of catering to people?

the second thing I think that is like potentially weird is the way to me that, the times I've tried to use them to do like analysis, they're not great at writing a query, but they're pretty good when you're like, let me describe to you a situation. This number's up, this number's down. These are what these things, what should I look at next?

Come up with some ideas. They give you some pretty good ideas of like Here are 10 hypotheses about why this could be the case. And you're like, how do I explore that? And they go and you give you some ways to like questions you could ask. Like it's not doing the work for you. It's doing what analysts say is the fun part.

It's doing the part of coming up with the creative stuff. and to me, that's a little bit of what this actually could do is it's like a lot of people say, Oh, they'll automate the parts of the job we don't like. And we'll get to focus on the fun part. I'm not sure they're not going to just do the fun part.

I'm not sure I'm the query writing robot that is responding to the prompts that it's telling us to do about you should try this or try that because that's actually what they're a lot better at. And so I don't know, there is a sort of weirdness there to me where job that we want is like sitting there and thinking about problems and solving puzzles is like, they might be better at that than we are.

Adel Nehme: And then when you're talking here about the this is very fascinating, you're talking about the potential uses of large language models for the actual creative aspects of the data job, And even when you mentioned that first element of providing the large language model, Infrastructure to be able to ask better questions about the organization's data combined with that, ability to come up with creative problem solving I think, could be radical in the future.

Let's maybe focus here on, what that world would look like. in a lot of ways. The current interface we have with data, think tables by tools ideas are not built for a world where machines can potential reason about. problem solving code, different solutions.

How do you see the future of the data interface there then? And what do you think our interaction as analysts with data look like?

Benn Stancil: I'm not sure I don't know. I think it's possible that that. You know, we, they're not good enough at this yet. And it's a little bit of a pain. Like, Honestly, I think that in some ways, like one of the things that's also hard here is. You have to be very particular about what you tell it to do. Like when you're trying to say like, Hey, I want you to help with a thing.

You end up sort of being like, and here's this, like it's, it's work to tell it what you need it to do. And so I think there's some stuff there where it's it's still not a great process, but I think that one, like. we may end up just spending a lot of time. It's not prompt engineering. Like I think prompt engineering is the analogy I'd use on this is back when search engines first came out, I remember like in middle school, people were like, you need to learn how to do various Boolean search stuff in Google.

Like you need to learn how to include this word minus this word. And there's like a little bit of like a query language in there that was like, this is the future is this sort of language. And that was right in the sense that. Googling was the future. And it's a very like, if you can Google things, that's a valuable skill, but it wasn't right in the sense that that's not how you learned how to, it wasn't, you learn this like particular language for Googling it.

It was you like, you learned how the system worked. You learn the ways around it and how to sort of make it do the things you wanted it to do. I could imagine there being the same thing that happens here. We're like prompt engineering, which is this kind of like. More structured, how do I use Lang chain or whatever became the first version.

But to me, that's like the query language and Google thing. Really what it is, is it's like you kind of learn the tendencies of the models. You kind of learn like how to coerce it into doing the thing you want it to do. And so I can see there certainly being a, the analyst job is partly like combining the skills that you have is like reasoning through a business problem, but also coercing like the LLM that is probably better at some of these things than you are.

into helping you do what you want to do. And so it's like figuring out the way to sort of operate the machine. Where like, all right, I need to describe the problem to it. I need to help it like coach me on some stuff. It'll come up with some ideas. There's this, I, it just came out yesterday, but it's a book I want to read.

There's a guy who wrote, I don't remember his name. Seth Davidson, He, he wrote a book called Everybody Lies. it was about like the things that people. Admit to Google because they're like looking for stuff and they think nobody's

Adel Nehme: Seth, Seth, Steven, David, though it's,

Benn Stancil: Yeah. Yeah. But he just wrote a book that was about the MBA.

And the, gimmick of the book was he wrote it in 30 days with all AI. Where he essentially did a bunch of like analysis and stuff on it. And it was all like, he basically just like chat GPT'd a book. And I'm interested in reading that because I think like that probably is how this actually works.

It's like he produced a hundred page book full of analysis in 30 days. Is it good? I don't know, but there is some skill in that, that I'm sure that he developed that isn't prompt engineering. It's not strictly analysis. It's like figuring out how to use this thing to be really, really effective and fast at doing analysis.

And I think that's, probably like what this really looks like is you learn how to do that sort of stuff. I don't know. What does that mean? I don't know. But like, how would you have told someone in 1990, you're going to have to learn how to be a really effective Googler. If you want to be a functioning human being in the world, I don't really know what you'd even say, but you're like, look, there's going to be this kind of weird technology.

It's going to do all this crazy stuff, but you're going to have to be really good at it. Otherwise you're going to be like. incapable of functioning in modern society. And it seems sort of like that's what this is. It's not, go take a bunch of courses on the ins and outs of LLMs. It's kind of like, you're just going to have to learn how to ride the bike.

Adel Nehme: yeah, it's hard predict here. And especially when we think about, how the skill set of the modern data analysis practitioner will evolve. If we take the assumptions that we're discussing that more and more LLMs will be able to assist in creative problem solving and in actually creating a lot of the code that, you know, goes into solution, what do you think will be the main skill sets data practitioners will need to evolve and develop in the future?

Like what will be, you know, let's say grad school in 2030 or a data degree will look like? Hahahaha

Benn Stancil: mean, I think grad school will be the same. I think it'll be, like, it's going to be, it's going to be a bunch of, learn some MATLAB and some like, stats, you know, like, I don't know.

Adel Nehme: Hahahaha Yeah

Benn Stancil: I have a different question, but I think it'd be the same. Um, I mean, I think like they, the skills that will make people good, I mean, honestly, it's probably like the skills that make people good now, which is.

Well, okay, actually, I think it's the skills that make people good now, but, maybe like in a different way. So, so there's this interesting thing. This is a, this is like a, a more, I don't know, delving into philosophy that I can pretend I'd understand things about, but I don't. One of the things to me that is tricky about LLMs is they're able to make pretty convincing arguments and do pretty convincing like analysis.

Like this, I imagine this NBA book has convincing charts in it. And Logical arguments and data analysis are hard to poke holes in. That's the point. This is sort of like debate club. If you, you argue with someone who's really good at debate, they're going to defeat you at debate, not because their argument is better, or like, they're right, but just because like, they know how to play the game.

And LLMs know how to play the game. They know how to say stuff that's convincing. And I think that there is something that's tricky about this, where When you can just say to an LLM make a good argument for this point or like do analysis on this thing it will give you something that is relatively convincing and There is no like foundational truth to that thing Like if I have to come up with an argument that is convincing about some analysis, I'm not that smart And so, I don't find good arguments in it unless they're relatively easy to find.

It's hard for me to go and find the totally obscure thing that proves my point. reason I was able to make a good argument is because the argument, good argument was, like, fairly close to the surface. But if you're like this all powerful super, not that all powerful supercomputers, but like in effect, if they're all powerful supercomputers that can always connect the dots in this thing that tells a compelling story, everything will look compelling and we're not that good at figuring out if it's not.

And so I think like actually in a lot of ways, the job of an analyst will be what the job of probably this, the Seth wasn't writing this book, which is this thing is going to give you a thousand things that all sound pretty convincing, which ones are real. what actually is truthful in this?

Like it's, if I, if I ask it to be like, give me a very convincing, you know, this is a different example, but say like, give me a very convincing analysis on why drinking a glass of wine every day is good for you. It'll give me something very convincing. And I say, give me a very convincing argument on why drinking a glass of wine every day is bad for you.

It'll give me a very convincing argument. What do I do? Like my job is to read these things and actually be like, this one seems better. And that's like pretty hard to do. And so I think actually a lot of the job of this could be, we have a thing that will tell you whatever you want. It will give you a convincing argument for any point you want to make.

My job is like, they'll try to figure out which one's better. Like it's to sort of moderate the debate. You know, I don't know.

Adel Nehme: Do you think that makes the data job better?

Benn Stancil: I think it makes it not better for the, there's some people who like. I want to put on my headphones. I want to sit in a room and, have my three monitors and write code. Like, I want to be an engineer. I like that mindset of like solve the puzzle. I don't think it's better for that because I think you become less of a builder.

I think you become less of like, I'm creating a system. I think you become more of a, I got to reason about a bunch of stuff. I got to sit here and just think about which one makes sense and kind of like poke holes in it. And there are some analysts love that. Some analysts love being the person who's let me debunk every single thing you say, because that's like fun.

So I think in some ways it's that, but if you are the analyst that like got into it, because you actually have the mindset of an engineer. I think it's kind of less than that.

Adel Nehme: One thing that we haven't yet discussed on, you know, the potential uses or weird avenues large language models open up when working with data is something that you mentioned in your most recent article, Average Text. You describe a hypothetical world in which a bar owner summarizes user interviews grouped by age group using a simple query where here we use a fictional SQL function called average text.

This in my opinion really well illustrates, a potential for large language models and surfacing, really detailed granular insights from unstructured data. That will fundamentally change how organizations make decisions. If you can at a fly, summarize, user interviews grouped by demographics, try to or group by product, for example, to try to understand how a particular group feels about your product.

I think that gets you much closer to reality than just, behavioral exhaust. So walk us through how you view AI will fundamentally change, you know, how we work with this type of data and what you think the opportunity here is.

Benn Stancil: to me, a lot of the data industry has gotten built up around. And like data industry is going to be bigger than the data industrial complex. We'll say like the, it's, it's partly like data companies, but it's also like this mindset, like data is the truth.

This is you know, that kind of stuff. I think a lot of that has been built up because data is the it feels scientific. And in some places it like, if you are trying to, make a vaccine for COVID, great, please do the science. But if you're like a product manager trying to make a decision about something, you actually just like want to understand what customers think.

and data is an artifact of that. It is not what they think it is. you

Adel Nehme: It's a

Benn Stancil: is, yeah. And it's a lousy proxy to be honest. Like it is in some ways it is, but it's okay, we see what people do, but like we interpret a bunch of intention from that, but really you're still just trying to get at the intention.

Like you really, you just want to know, like what's in these people's heads, unstructured data, customer feedback, interviews, user research basically is what's in people's heads. And you can't just take them at face value. Can't just be like, this is what this person says. Therefore we have to do exactly what they say.

Obviously you have to, sort of still think about what it is, but there's, there is so much more richness in that than the kind of exhaust that comes out of their behavior. but I don't think we think we like view the unstructured one is the like Soft, squishy stuff that's not as valuable because we don't have like scientific implements around it.

And so I think that not that LLMs give you the scientific implements exactly, but they give you ways to look at unstructured data at scale, potentially, where it's not like Great. We did an interview with 10 people. How do we like extrapolate this, but also add all of the caveats that it was 10 people and who knows, and all this stuff.

It's actually, what if you can look at a thousand customer, you're going to look at 10, 000 and sort of like summarize them and like do that in a way scales, it starts to be like, you can look at a dataset that is as rich as what you would get in this unstructured stuff, but with some of the like properties of looking at things at scale.

Some of the like, okay, the statistical significance is exactly the right thing. But like some rough concept of, hey, a bunch of people said this, it's probably real. We don't have to just be like, well, we talked to 10 people and eight of them said it. So that's probably enough. It's no, we can look at kind of the totality of how people feel and stuff like that.

And so if you can do that. There's a lot more richness, I think, there than there is in whether or not somebody visited a pricing page. Like right now, if we want to understand, is our product priced correctly the way that we'll do it is look at people who visit the pricing page and what do they click on and try to do some sort of demographic things of the people who fit this clicked on this and try to infer a bunch of things from that.

And to some extent, that's not bad, but what would be better is what if everybody visits the pricing page, you said, as they're leaving the pricing page, can you give us a 15 second audio clip of how you feel about our pricing? And we'll send you a 5 certificate to Starbucks immediately.

For 15 seconds, I would do that. I'd click the button and be like I think it's too expensive. And now if you have all that information, like that's going to tell you way more than you trying to infer a bunch of weird stuff from. Like user behavior does it work? Maybe, maybe not.

But I think if we start to think about unstructured data as being the information there being more accessible and we start to try to collect more of it, it feels like there's a lot more sort of informed decisions you can make that way than you would off of. Again, trying to parse your way through how people feel through this kind of like the footsteps that they leave behind.

Adel Nehme: Yeah, that's, that's fascinating. And you could probably use an NLM system of some form to scale the amount of interviews that you can make in a short period of time as well to get set. It's interesting once you, you know, open up the imagination to how these systems can fit into the data collection workflow as well.

now we're talking about, you know, a lot of the long term stuff and kind of what we expect to be weird in the future. But let's also focus on the short term. ThoughtSpot, which you are part of now, you know, you mentioned the semantic model of working with a large language model. In my opinion, it has a great, elegant solution of embedding AI into the analytics workflow.

Maybe walk me through how you view the short term of AI and analytics and data.

Benn Stancil: I don't think that much changes. I think like people will, in the short immediate term, it's like people trying to figure out what works. A lot of experimentation, a lot of things that are going to be busts, honestly. ThoughtSpot have been doing this for a long time. And one of the things that, that you know, I appreciated about it when we, started talking to them before the acquisition was they have been in the trenches of this stuff for a while.

Like they had been building things that were sort of AI powered well before the LLM hype. And so they had seen the things that work and don't work and, had been through the like, what if a chatbot wrote a SQL query? And they're like, Oh God, this is not going to be good. And so I think there will be a lot of experimentation with chatbots to do that kind of stuff.

And every, you see that like every data tool. Has some flavor of product that's like white question. And you don't imagine the right SQL query. And I think we'll get kind of inundated with some bad versions of this just because it's a two week project to build a wrapper around open AI or BART or whatever.

It's a nine month project to make that thing like good enough. I think a lot of people do the, the two month version. It won't be that good. And it'll like slowly fade. Cause it's it doesn't really work. People are frustrated by it. But I, the thing that I'm hopeful of is. In doing that, people also experiment with a lot of new stuff.

that, it's not just like, kind of to your point of like the beer app. Somebody made the beer app. I guess the beer app's kind of creative, but okay, kind of a cool thing. But the first apps, they were all kind of the same stuff. It was all like some social media thing or whatever. But then people were like, there's all these other interesting things you can start to do with it and ways you can start to create variants of that where it's like some of them actually become pretty powerful and really useful.

And so my hope is that people will experiment basically. And like, as folks now will have to go through sort of an experimental phase, there's a bunch of weird stuff that people are creating. Some of it will be like, this is all bizarre and awful. but people will uncover ways that like really help.

I think Copilot's the first version of that, where it's not a building an app that does this, it like is a really fancy autocomplete where you can pseudo code in a code. I think that's a good idea. Again, is there other things where you can like, it can start to, you can start to ask it more like, how do I reason about this problem in a way that feels very natural where you're not having to go over to chat GPT and describe four pages of context, but here's the business problem and just that sort of stuff where you have this sort of assistant that's going to guide you through how to think analytically, tell me more about that. I think that'd be cool. Like that sort of stuff, I think is what we'll start to see more of is introductions of kind of new points of injection into, or like new people will inject this technology into various points in like how they work.

Some of it will be all obnoxious and not very good, but some of it will be like, oh, wow, this actually really makes things better. And then we'll all sort of like, you know, start to copy the stuff that works.

Adel Nehme: Okay, exciting stuff. Now Ben, as we wrap up today's episode, do you have any final notes or message to share with the audience before we wrap up today's conversation?

Benn Stancil: No, I appreciate, you having me on. the main note is I don't know, I'm just like shooting a bunch of things from the hip here. So don't take any of this as, me having any idea what I'm talking about. I assume that the future will look entirely different than anything we talked about today.

Adel Nehme: Definitely, everyone. I'll do the plug instead of you here. Check out Ben's releases every Friday, correct, Ben? Yeah, it's really, really illuminating. It's definitely on my weekly reads list for me as well. So yeah, everyone do check out Ben's Substack. We're going to leave it in the show notes as well.

Benn Stancil: Awesome. I appreciate it. And again, thanks for having me.

Adel Nehme: Thanks a lot, Ben, for coming on the podcast.

Topics

Artificial Intelligence

blog

DataFramed Podcast Series: AI and the Modern Data Stack

Find out about DataCamp's upcoming podcast series focussing on how AI is becoming a must-have for data teams and organizations.

Adel Nehme

podcast

Data & AI Trends in 2024, with Tom Tunguz, General Partner at Theory Ventures

Richie and Tom explore trends in generative AI, the impact of AI on professional fields, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, the challenges and opportunities surrounding AI in the corporate sector and much more.

podcast

Data Trends & Predictions 2024 with DataCamp's CEO & COO, Jo Cornelissen & Martijn Theuwissen

Richie, Jo and Martijn discuss generative AI's mainstream impact in 2023, trends in AI and software development, how the programming languages for data are evolving, new roles in data & AI, and their predictions for 2024.

podcast

What Fortune 1000 Executives Believe about Data & AI in 2024 with Randy Bean, Innovation Fellow, Data Strategy, Wavestone

Randy and Richie explore the 2024 Data and AI Leadership Executive Survey, the impact of generative AI in 2023 and what to expect from it in 2024.

podcast

[Radar Recap] Charting the Path: What the Future Holds for Generative AI

Tom Tunguz, General Partner at Theory Ventures, Edo Liberty, CEO at Pinecone, and Nick Elprin, CEO at Domino Data Lab, explore how generative AI tools & technologies will evolve in the months and years to come through emerging trends, potential breakthrough applications and more.

podcast

[AI and the Modern Data Stack] How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence and lakehouse technology, how AI tools are changing data democratization, the challenges of data governance and management and how Databricks can help, the changing jobs in data and AI, and much more.

See More See More