Skip to main content

Low Code Data Science with Michael Berthold, CEO and co-founder of KNIME

Adel and Michael explore low-code data science, the adoption of low-code data tools, the evolution of data science workflows, integration with AI tools, the future of low-code data tools and much more.
Nov 14, 2024

Photo of Michael Berthold
Guest
Michael Berthold
LinkedIn

Michael Berthold is CEO and co-founder at KNIME, an open source data analytics company. He has more than 25 years of experience in data science, working in academia, most recently as a full professor at Konstanz University (Germany) and previously at University of California (Berkeley) and Carnegie Mellon, and in industry at Intel’s Neural Network Group, Utopy, and Tripos. Michael has published extensively on data analytics, machine learning, and artificial intelligence.


Photo of Adel Nehme
Host
Adel Nehme

Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.

Key Quotes

There's this democratizing of data science that everybody keeps talking about. I think low code tools are fundamental for that. There's this nice quote of someone who said, "I don't want to teach my students half a year of programming before they can actually start working with data. I want to make sure they get started doing interesting stuff from the get-go."

A huge strength of low code is that you can get an SQL expert and an R expert and a Python expert and a JavaScript expert, and they can collaborate within the same environment. 

Key Takeaways

1

Low-code tools allow more people to work with data without needing to write code, making data science more accessible and faster to deploy within organizations.

2

Organizations should invest in upskilling initiatives that teach employees how to interpret data insights, helping them understand core concepts like correlation and regression without needing to know how to code them from scratch.

3

Low-code platforms like KNIME facilitate collaboration by allowing experts in SQL, Python, and other languages to work together seamlessly, enhancing team efficiency and project quality.

Links From The Show

Transcript

Adel Nehme: Michael Berthold, it's great to have you on the show.

Michael Berthold: Thanks for having me, Adel.

Adel Nehme: So you are the CEO of KNIME. So let's talk about low code data science and maybe to set the stage, how would you describe right now the state of low code data science and how do data teams leverage it today?

Michael Berthold: So low code data science, I mean, it is used in different ways, and I think we'll dive into that a little bit later as well, but fundamentally the idea is to get data science out of the hands of programmers, or at least make it easier so that people don't have to write lines of code for each and every little thing they want to do with their data.

Adel Nehme: Okay, great. And maybe how do you see it now? the current state, how, widely applied is it? How widely adopted is it as a tool within any modern organization today?

Michael Berthold: I would claim that a lot of people are using some bit and piece of low code data science for some aspects of what they're doing, their daily work, but fundamentally it's really about is the low code aspect really only kind of like a wizard or an easier way to create code, or is it really a tool in itself?

And I think that makes the big difference. And I think that's also where KNIME differentiates itself from quite a number of other low code tools.

Adel Nehme: Over the past few years, like a lot of the data science workflow has shifted, you know, we've seen a much big... See more

ger emphasis, for example, from going from experimentation more to deployment, right? And given that you're building tooling that really focuses on the data science workflow, let's just see from your perspective, how has the data science workflow shifted over the past few years?

And where do you see local data science fitting into that new evolving workflow?

Michael Berthold: Yeah, that's a very good point, right? When we started doing that, data science was essentially done by small teams in the corner of an organization. It was like the geeks. They were sitting in the little cubicles, nobody to talk to them, problems were thrown over the wall. They addressed them, came back with answers.

The answers were usually the wrong ones. Nobody really cared about answers because they were actually interested in different questions. And what really happened behind the scenes didn't really matter, so people were sitting there and writing different types of code, right, there was SASBase and R and SQL or, or, or whatever was sitting out there.

And low code really has allowed people to start working with data, making sense of data without having to go through all of these technological complexities. So I think that's really something that, I mean, there's this democratizing of, data science that everybody keeps talking about, and I think local tools are fundamental for that one because, I mean, if you have a, there's this nice quote of someone who said, if I'm, I don't want to teach my students half a year of programming before they can actually start working with data, I want to make sure they get started doing interesting stuff from the get go.

And the other piece, I think, is that there's a bit of this field of spreadsheet kind of wrangling, data wrangling, whatever you call this, and this idea of data science, where it's really about machine learning and hardcore statistics, those types of activities have merged a lot, a lot more so that you now want to get the people that are actually used to working with data and making sense of data, working with the spreadsheets and doing really sophisticated things in Excel or some other spreadsheet.

Okay. You also enable them to gradually improve their skillset around the machine learning aspects of it. how can I build a predictive model, those types of things. And I think that's where low code allows people to get in touch with different types of technologies a lot easier.

Adel Nehme: You mentioned something here on the democratization is something that we really think about as well, like how to make data science accessible to the masses within the organization, right? And I think if you're able to unlock that, you're able to unlock a lot of the value of data science. How do you see this democratization playing out now with low code data science tools, right?

And how has it changed maybe the dynamics and priorities within the data function? You mentioned that merging aspect, right? Like, so maybe expand on that a bit more.

Michael Berthold: So what we see is that KNIME is often now also being used as part of the upskilling initiatives and organizations. So it's really this idea of everybody who wants to make sense of data. I mean, they need to be willing to invest a little bit of an effort, but at the end of the day, you're doing complicated things.

But has a tool at the right level of complexity that allows them to get started, right? And in a sense, you're used to writing macros in Excel, but you have a really hard time automating that one. So low code data science allows you to easier automate those types of things. On the other side, you have people that come from the machine learning.

field, they have a really hard time merging data sets doing all of the data wrangling because they may not be SQL experts, but they can now use a low code tool to do that part of their activities as well. And that kind of brings all of these different groups together and allows them to collaborate within the same environment.

I think that's also a huge strength of low code that you kind of, it's almost like you get SQL expert and the R expert and the Python expert and the JavaScript expert, they can collaborate within the same environment. That's one of the powers of low code as well.

Adel Nehme: Yeah, and I love that collaboration angle. And there's one thing that you also mentioned that, you know, in the past, data science workflows, where a lot of teams, businesses come with a request, they would throw it over the fence, come back with the wrong answer, like you had that siloing effect. Do you see now that the data team's time is unlocked to work on more higher level strategic objectives, right?

Rather than doing, analysis that no one ends up reading Because the business user now is empowered as well to fish for themselves and find the answers for themselves.

Michael Berthold: Part of that is true, right? I mean, at the end of the day, if you always compare it to, if you don't know what a regression line does, I can make it as easy as possible for you to create one. You will not be able to interpret the results, right? At the end of the day, you still need to understand. what those methods all do, but you don't necessarily need to understand how to make them do it, right?

How is it done? I don't need to know how these coefficients of the regression line were actually found, but I need to know what they mean. So, yes, I think it makes it easier to get that into people's hands, but I wouldn't say now everybody's empowered to make complicated, run complicated analyses.

I think you still need to train people up so that they understand what they're doing.

Adel Nehme: You've seen, you know, definitely successful implementations of KNIME, from an upscaling perspective, what was the formula for success for a lot of those successful implementations of KNIME?

Michael Berthold: so one of the nice aspects of KNIME is that the open source, the analytics platform itself, so the workbench that people build these workflows in the low code environment to actually work with data is free and open source. That makes it a lot easier to be part of an upscaling initiative in an organization because everybody wants to try this out.

Just try it out. You don't need to buy a license, a seat or something. But you allow people to get started and only when they really want to deploy it, when they want to seriously collaborate, when it really becomes part of their daily work or their quarterly work, whatever that's when they require seed and that's when our commercial compliment offering comes in and then allow us to deploy that to others, to share with others, that type of stuff.

Adel Nehme: When you think about, the presence of, low code data science tools within modern data teams today, like, how do you see those tools coexisting with the more, you know, traditional coding tools like Python or R, right? Like, value does low code data science tools provide versus these tools, and where does the more traditional programming tools provide value?

 when should you use each type of tool, essentially, as a data scientist?

Michael Berthold: I think they coexist very nicely, and I think this is the point where we should talk briefly about the difference between some low code tools and other low code tools, because fundamentally, a lot of these low code environments that are currently out there, underneath the hood, they still generate code, and at some point in time, when you want to really fine tune what's going on, you need to reach into that code and do that there, so in a sense, it's a wizard that makes getting started and writing code easier and just takes care of that, but then fundamentally, you're still dealing with code.

And then there's the other class of tools that KNIME is a part of where the workflow, this visual environment is really the programming language itself. So, obviously, we're not rewriting everything underneath the hood. It calls out to Python, SQL, whatever, some of these technologies, all of these cool libraries.

But if you don't want to, you never need to reach into code to turn a parameter or something. So, you have control over what these methods do, but you don't need to deal with how they do it. However. It's super important that if you do have coding experts in your organization, you allow them to still collaborate with the workflow users, And the way we do this at nine is that we allow people to integrate code and rapid inside a note and one of these modules. And then you can expose that to others. And I think this is really the way to have these People coexist and actually not just coexist kind of neutrally next to each other, but really actively collaborate by allowing the people that are writing code and doing, maybe it's very customized stuff, accessing a strange little data source that only your organization has.

NIME doesn't have a connector for that one, so somebody writes that connector in some code, embeds it, wraps it into a node, and that way everybody else in the organization can use it.

Adel Nehme: Okay, that's wonderful, and it really speaks to the importance of a great user experience and user interface here that combines both of these use cases together. Can you walk me through the lessons you learned over time at NIME, as you were building out this functionality and catering to the wider data team and all of these use cases? 

kind of lessons in building user experiences here for low code data science.

Michael Berthold: So the one lesson I keep learning is, so this workflow building, which at the end of the day, you're programming, right? You're visually putting together logic, which is what programming is, you kind of design logic is so close to actual code, code based programming, that explaining to you. my developers from time to time, Hey, your audience isn't quite you, right?

It's someone who doesn't write code, but still creates logic. It needs to have a slightly different angle to it. So user interfaces, the user experience is fundamentally different. That's actually pretty hard to do because it's too close to each other, right? I mean, it's probably easier to do. I mean, design an ATM interface.

You realize this could be my mom standing in front of it or my dad. But for this one, it's kind of the audience is a very, very similar. similar because they're building complicated things. So that one, keep hammering this one. We are building a tool not for you, you coding developer, but for someone very, very similar to you, but who's interested in making sense of data.

nudging that obviously in the right direction has been difficult. And then the other pieces, of course, there's this constant debate of how much granularity do we really want to have, Fundamentally, what we could have is one module like the Swiss army knife that you can configure any way you want to do anything you want to do with data and then you have a couple of these blocks next to each other.

But making sure that the granularity allows you at the workflow level to understand what you're doing with the data step by step but not have forcing people to add a thousand nodes for a simple task right. Making that balance right there was always a bit of a struggle but I think they found a nice compromise there.

Adel Nehme: Yeah, perfect. And maybe can we comment on like the speed of learning that you get, you know, because data science workflows are very, very complex, right? Like, and when you learn from how customers are actually learning how to use these workflows that you have online, right? Walk me through the iterative process of further improving product, right, and like the workflow to be able to accommodate all of these different styles of, analytics workflows and how people use data in the real world, because it's a lot more messy than just, having a workflow that is, very neat.

Michael Berthold: Yeah, that's a very, good point. I mean, in that sense, it's very similar to code based programming, right? I mean, if you want to, you can build a messy workflow that nobody will ever understand ever again. So you need to kind of have a bit of self discipline to make sure your workflow is well designed so that it actually does document what you're doing and doesn't completely, completely hide the complexity.

At the end of the day, you're absolutely right. Some of the workflows that we see in production are super complicated, right? But the nice thing is what so what we typically see is that when we have new users that are coming from a spreadsheet world or from somewhere else doing data engineering type activities it takes people typically a day or two.

to understand, have this mental click, they say, ah, this is what a workflow does and how it works, because then they are able to say, but if I want to do this, there ought to be a note for that one, right? And then they start finding it and doing that. And what we typically do is that we then come in a couple weeks later, after people started working with nine workflows, and we offer workflow doctor sessions, where we essentially say, just bring your workflow that you built, and we'll give you advice on this at all.

I mean, you know, You did this. This is one way of solving the problem. But here's a better way of solving the problem or a different way of solving the problem. Just kind of giving this kind of education, a little bit of polish. And that works extremely, extremely well. And as I said, I mean, it's super easy for people.

I mean, we have these hackathons where you get people started and it's kind of. guide them to building their first workflow to do a little bit of spreadsheet automation or do something with a bit of I integration or do a little bit of data wrangling depending on where they come from. And then once they do the stuff they used to do in the.

A different, more complicated way, typically using KNIME workflows, then educating them about other techniques is then very easy. You don't need to teach them how to do that in KNIME, they get that anyway, but we can tell them, hey, by the way, there's other types of stuff that you can do with data, and here are the modules in KNIME to do that.

So from an upscaling, it's a very, very smooth journey. And the nice thing is you keep staying in the same environment, right? I mean, we see nine workflows that just do and just large air quotes, because it saves a lot of time. Regular spreadsheet wrangling, right? And just gets automated. And now from two days, we're down to two minutes.

On the other side, we see people doing really sophisticated deep learning type workflows that are for quality control and images.

Adel Nehme: And that's flexibility is really, really interesting. Right. And when you look at, for example, you know, you're mentioning these examples of like simple spreadsheet automation, even like deep learning workflows I assume here like you have, depending on the type of team, depending on the type of organization, right, you may have different challenges when it comes to adopting like low code data science and switching.

to local data science, whether for an existing data team or expanding it to business users. Maybe what are the major challenges associated with adopting a local data science tool in your organization?

Michael Berthold: It's often almost religious type difficulties when you walk into a room and you realize people sit in there and it's like, but I want to keep doing this in whatever tool they're using. that's a very, very hard hurdle to get over. Often when we, it depends a bit, sometimes you talk to these people that were, that are avid coders and they see the power of continuing to code what they're doing, but being able to share that with many more people in the organization.

Very, very easy hook, and they see the value there. Sometimes it's more the manager of the team that says, Pooh, I have these eight people, they're all doing wild stuff. If one of them leaves, we don't have no idea how to maintain this going forward. And says, Hey, this is all becoming part of a workflow.

Maybe there's some code embedded within. It's something that people can actually build on top of it and keep maintaining. So it depends a lot of the dynamics in the room almost. But fundamentally, I mean, at the end of the day, what we're doing is we're providing people with a programming language.

It's a visual programming language that's at the right level of obstruction to do something with data. And the moment you get that, it's actually very, very easy.

Adel Nehme: Yeah, and this is, I think, goes you know, you mentioned, again, the religious aspect here of, like, a lot of data practitioners have, One fall into the trap of always favoring a one set of tool for the job for cultural reasons rather than what is the thing that does the job best, And then second thing also like they fall into, I call it same trap here of like resume driven development where the type of projects that they choose as well as depending on the tool and like wanting to try out certain packages or technologies, right? how do you work on that as a data leader?

And, Try to create more cultural flexibility within your team when it comes to the toolkit to the what type of projects and like focus on the value.

Michael Berthold: Oh, I think that's a problem not only KNIME has, right? You always have that once people got hooked to a tool. How is that when you have a hammer, everything looks like a nail? I mean, sometimes. Sometimes I see people building KNIME workflows for stuff where I think, you know, I know KNIME can do it. It's a complete programming languages.

So in principle, you can do anything. And people sometimes do anything. And you're like, you should not be using KNIME for this one. I mean, it's kind of cool, right? But really, yeah. that part, I mean, sometimes we do these hackathons where we try to get people to just explore it just for the fun of it.

 But that to me is the bigger problem getting positioning this right. But I mean, also when people sometimes say, so are you going to replace Excel? It's like, no, not at all. Right. I mean, Excel is good for a lot of things and it's actually often it's the consumption layer for a nine bar flow and say, hey, for other people to consume, to see what we were building as part of this nine bar flow.

It's an Excel sheet or it's a PDF reader. I don't really care. Right. So we're in the middle part doing all of the data wrangling, automating this one, if it's about a one shot analysis that you will never, ever try to do anything like that again, use Excel for it, right? But if it's something where I say, I need to be able to explain that to my boss, what I did, or I'm probably going to do this many more times in similar fashion, then it's probably worth building a workflow for this one and getting that mindset into people.

But I think that's a problem with pretty much all tools.

Adel Nehme: I think it's going to be very hard to, unsee Excel as at least one of the main tools. And any modern professionals toolkit maybe coming back on the importance of upscaling and data skills here. You know, you mentioned here how, the democratization part really rests upon like understanding, what you're doing.

Rather than like how you're doing it per se, Maybe walk us through in a bit more like the importance of data literacy and data skills when it comes to like adoption for these types of tools. And how do you get started as an organization when building kind of that data literacy quotient within your organization?

Michael Berthold: So the second part is complicated because that really depends a lot on your organization. But at the end of the day, we are, as I said, we are building a tool at the right level of obstruction for data workers. So if you have someone who is willing to make sense of data they're willing to learn how to build these types of workflows and they're very quickly see that they can do pretty interesting things with that data.

And I mean, you now have a lot of requests. We see that. I mean, we at nine, we use nine workflows all over the place for to figure out what's going on on our forum. What's the download patterns? Where do they come from? That type of stuff. And that just allows you to automate so many things and give people.

So what we do is we provide to people either Okay. Complete data apps where it's really a web page you log in. You don't even see the workflow that runs on the hood. You can deploy workflows as data apps, but in many cases, more powerful is to wrap a piece of a workflow into what we call a component, which is essentially looks like a node, but inside the node is another workflow.

And that allows you to give people entry points to different views of your in house data. So think about having an organization that has, I don't know, five different CRM systems because they bought companies left and right. They'll never. Really integrate all of that stuff into one CRM system, but having people analyze that one requires them to have a view, a unified view on these different CRM systems together, and when they buy the six company, you don't want to change everything.

You just want to make sure that you now has the six components. CM system be part of that. And that's underneath the hood is a nine workflow. So we see that often that organizations build, I call that like, it's almost like a virtual data warehouse rather, rather than trying to get everything into this one data warehouse and then you have to change it the next day because something changed, you're just building this one component that dynamically creates these different views.

So what we are doing in-House is that we have a couple of. Actually, many different components that give you these views on things that you might care about, right? Users that we have, what's going on on the technology side, what's going on on the website, that type of stuff. That I think is super powerful, these workflows as a documentable way to wrap that into components, say, this is your entry point to get started.

Adel Nehme: perfect. So definitely create templates, make it easy for people to become part of the, program, and maybe, you mentioned that the second part of my question is harder to solve than usual, but what do you see as good patterns for organization kind of building up those data skills that they have to be able to increase adoption?

Michael Berthold: I would see that as, I mean, building these entry points where you say, this is a good view on the data that we have to get started with. And the other part is, of course, providing a lot of blueprints. So we are, we're providing you with the Lego to build your own data applications, so to speak, or workflows.

We are not the ones that provide you the complete solutions, right? We say, hey, this is it, and that solves all your problems. However, what we do, and that's the NIME Community Hub, where we have thousands of, workflows. We call them blueprints that solve similar problems. So if you want to do, I don't know, churn prediction, you find a couple of workflows on the community app that do churn prediction where you say, ah, this is something that looks similar to the problem I'm trying to solve.

I'm going to use this as a starting point. So many, many people that we have don't start from scratch and build workflows, but they start with a template. a blueprint and say, okay, this is already solving the problem. I have, okay, my data is not in Oracle, but it sits in some other database. I need to change the connectivity here.

Column names are a little bit different, but the rest of the workflow works. And I think that's, from an upskilling perspective, also a very nice way to get there. In coding, people do that now as well, a lot more than when I was young. You actually wrote code from scratch. Now there's a lot of this copy pasting, right?

Where I say, here's already a solution to my problem. I copy it over. These coding co pilots do that fundamentally as well, right? They give you suggestions. Here's a building block adjusted to your needs.

Adel Nehme: Yeah, I couldn't agree. We're definitely going to talk about the rise of co pilots and what that means as well for like low code data science, but maybe switching back here to, the broader low code data science industry, right? There's different flavors of low code data to data science tools, from tools that are much more focused on ETL and analytics tooling, for example, tools that are much more focused on automated machine learning. Walk us through the different, sub genres of local data science tool and where KNIME fits in as well in the broader ecosystem.

Michael Berthold: so we have tools in the low code space. As I said before, they're essentially just wizards on top of code. And just, it's not very nice to say, there are wizards on top of code and they do pretty sophisticated stuff and generate code underneath the hood. Then we have other players that are essentially building workflows, but they're, as you said, they're focused very much on the ETL part of it, but they don't really support the data science, right?

The machine learning, the actual analytics part of it very well. The reason for that is that some of those tools are really proprietary tools that It's all written by themselves, so to speak, but at NIME, and that's one of the reasons why the NIME analytics platform is open source. We are, of course, standing on the shoulders of many, many, many, many giants, right?

So we have on the ETL side, we have integration with Spark and lots of other these things in the old days, of course, also Hadoop. In the middle part for all of the analytics, we're of course standing on the shoulders of tons of libraries out there, some in R, some in Java, some, I mean, XGBoost, some written still in C that we just call out to, and then on the visualization side, we are using lots of libraries, most dominantly in our case, it's eCharts for the visualization, so we're not reinventing all of those wheels, so from that perspective, I think KNIME is different in that it's covers probably the most complete set from really the entire data science life cycle frame from ingesting the data all the way to analytics and visualization.

The one thing you also mentioned is AutoNL. To me, that's to me, that's a little bit of a different animal because that's really trying to automate The model finding and the parameter optimization. And we have components in KNIME that allow you to do that as well. But I'm, some of this AutoML stuff, I'm fairly skeptical about that.

If you aren't really transparent about what model you're using, it's often not about the last thing. It's not just percent of a percent of accuracy, but it's also about how efficient is this model? Am I going to pay huge computational cost on the production side just to get that last percent?

Is that really worth it? So I think fundamentally a lot of auto ML, I'm totally fine with automatic parameter optimization and model selection, those types of things, tool needs to be transparent about what it picks and why. So that you, as someone who still understands what the models actually use, as you know, yes, I know I get a little bit of extra accuracy, but the price for that one is much bigger volatility, for instance, on prediction accuracy when we retrain.

So let's stick with the simpler model and pay a little bit for price and performance. That's something I think that's a call you need to make, and you can't make that if you don't understand what's going on underneath.

Adel Nehme: There's a lot of nuances around deploying machine learning that goes beyond just accuracy, that needs to be taken into account with these tools. Couldn't agree more. And then you know, you mentioned here the rise of co pilots and the rise of generative AI tools that help with, building the, skeleton of code that you're building if you're doing a coding workflows, how do you see generative AI changing or accelerating the low code data science tooling space?

Michael Berthold: That's an excellent question because we've been pondering that for a while as well. I mean, as you probably have seen, we have integrations in KNIME, we have, we call it AI, the KNIME, KNIME AI that helps you build workflows. It helps you also assist just answering questions. I used to do, but we look up in Excel, how do the, what do I do with that?

And KNIME gives you lots of help there. So I think for that, it's super useful. It's doing like in many, many other tools. It's, it's the, it's a co pilot that gets rid of the boring stuff, right? And just. takes care of all of that stuff. We also have another element of that one that builds bits and pieces of workflow for you.

I think that fundamentally, for the easier ones, you can probably generate the complete workflow, that's fine, but for more complex data science I think you will use that more as a step by step assistant. Say, okay, let's focus on the data integration first. Okay, fine, now let's start modeling, whatever, and you kind of walk with that and the AI will just take care of a lot of simple things.

Fundamentally, I think, however, is that the huge difference is that a lot of people then think, oh, great, I have a solution, all good, and you move on. But in many cases, when you're building analytical workflows, I think you require pretty high reliability. Meaning that you can't afford that just because the AI had a weak moment and hallucinated a little bit, your workflow is creating complete bogus results.

So I kind of compare that a little bit to a good programmer that's using an AI co pilot. Shouldn't just copy it over and say, Oh, good. I'm going home and drinking beer now. But should actually verify that the code that was delivered is meaningful and makes sense. Just like if you have a lawyer creating contracts or something using AI, I think it saves a lot of time to do it that way.

But you should still look through that and make sure it's meaningful. I mean, we are using in house we use AIs to help us fill in RFIs. That's right. And if you get 200 questions, 180 of those are so obvious that AI do the job. But at the end of the day, we still go through and we briefly send it to check the answers.

And here and there, we do make corrections. And from a data science perspective, what does that mean? Do you really want to go in and look at the SQL and Python and R and JavaScript code and validate that it's really doing what you want it to do? And that's where I believe workflows come in super handy, because it's the level of obstruction where you can validate that what the AI built actually does make sense.

That helps you at the beginning when we're collaborating with AI to build this workflow to say, Oh, yeah, okay. That makes sense. Makes sense. All good. But it also helps afterwards to document and validate later. So it's almost like AI is for the AI and the collaboration with the human, the workflow is the common language.

Adel Nehme: That's really wonderful. And it comes back to that discussion that we had early on about UI and UX for these types of balanced workflow, because, correct me if I'm wrong I do feel like we are converging to a point where act of coding will be more and more so taken over by co pilot system, and then everybody Being able to like you needing to review, as you mentioned, like lines and lines of code, right?

It's not a sustainable solution here to like, check what the output is. And I do see like that workflow vision that you paint here of, checking the output of every single sub step, Being quite important. And do you find that the user interface and user experience of a lot of the coding tools, or analytics tools, whether coding or non coding tools, will evolve over time to accommodate the rise of generative AI.

And we're going to have a new paradigm of tools that we work with to a certain extent.

Michael Berthold: So that's an interesting question. I think, yes. So for coders, I think AI will still be the same type of paradigm where you're reviewing code, but at a much more probably more abstract level. Workflows for data type of work are to meet the very natural language to collaborate in. I mean, the AI, to me, it's, we see that in NIME as well, right?

We have an AI assistant for the Python and the R and some of the other coding integrations, but we also have an AI assistance for the E chart extension that creates beautiful visualizations. The E chart extension is something where I think in the near future, you will not look at the configuration anymore at all to debug that because from a human perspective, I'm going to look at the graphic and I'm going to say this needs to be red or green, change that, and it does that.

From the coding assistant at some point in time, if you really want to be able to rely on that one, I think you are, you will continue to be forced to look at the code at some point in time and validate that it's doing what it's doing. I you can then of course do the obvious, right? And you ask an AI, another one, is that what it's doing the correct way of doing it?

And you'll, you have an even higher chance of catching mistakes, but it's never going to be perfect. If you need to be sure or some sort of a guarantee that what it was created is accurate, you need to look at the result and you need a joint language to compare the AI's thinking to human thinking, in a sense,

Adel Nehme: If you look at now the space in the low code, data science tooling space, where do you see the space heading in the next few years? I hate to put you on the spot to give me some predictions, but yeah, what are your predictions on the low code data science space? And

Michael Berthold: I think the low code space where it's underneath the hood still generating code is something that's going to at least stagnant. I don't think these types of tools will completely disappear, but I don't see a huge potential for growth there also because a lot of that type of stuff will simply be taken over by AI rather than putting sort of the simplest stuff together visually and then have that translated to code.

I may as well just chat with an AI, have that taken care of. And you see that, I mean, Databricks now has this AI built in that does the data wrangling piece. They also have a little bit of a data wrangling workflow builder. I think that data frankly workflow builder will fundamentally disappear because both underneath the hood generate code.

So to me, really on the data, working with data space, the common language will be workflows. And I think that will take over what people do with data.

Adel Nehme: maybe one final question for me, Michael. For anyone looking to get started with low code tools within their organization, what advice would you give them to get started? Okay,

Michael Berthold: ideally it's easy, it's get started the nice thing is I can now say you don't even need to download it anymore to get started, but you can go to the NIME webpage, we have the first version of our online editor, browser based editor, we call it currently the Playground, so it allows you to experience workflows a little bit, start playing with that one, get a feeling for that one, if you really like it, if you think that's the right tool for you, then you start downloading and using it on your own machine with your own resources.

That's the best way. Go to either start with data camp or start on the nine site, play a little bit with workflows, then go to the data camp site and attend the course.

Adel Nehme: perfect. And then final, final question, Michael. Any final call to action or advice before we wrap up today's episode?

Michael Berthold: I think to me, having some sort of a data literacy, understanding what you can do with data and what it means when you see data insights, understanding what a correlation, whether regression is or maybe even more sophisticated models, is going to be a skill that's going to be fundamentally useful even more in the future than today.

Adel Nehme: Yeah, I couldn't agree more.

Michael Berthold: Thanks for having me.

Topics
Related

podcast

Data Storytelling for Kids with Cole Nussbaumer Knaflic, Founder and CEO of Storytelling with Data

Adel and Cole explore Cole’s book Daphne Draws Data, challenging limiting beliefs that can develop during childhood, building a data storytelling culture, the future of data storytelling in the age of AI, and much more.

Adel Nehme

50 min

podcast

Kaggle and the Future of Data Science

Anthony Goldbloom, CEO of Kaggle, speaks with Hugo about Kaggle, data science communities, reproducible data science, machine learning competitions and the future of data science in the cloud.
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

52 min

podcast

No-Code LLMs In Practice with Birago Jones & Karthik Dinakar, CEO & CTO at Pienso

Richie, Birago and Karthik explore why no-code AI apps are becoming more prominent, uses-cases of no-code AI apps, the benefits of small tailored models, how no-code can impact workflows, AI interfaces and the rise of the chat interface, and much more.
Richie Cotton's photo

Richie Cotton

54 min

podcast

Operationalizing Machine Learning with MLOps

In this episode of DataFramed, Adel speaks with Alessya Visnjic, CEO and co-founder of WhyLabs, an AI Observability company on a mission to build the interface between AI and human operators.
Adel Nehme's photo

Adel Nehme

35 min

podcast

Data Science, Past, Present and Future

In this episode, Hugo speaks with Hilary Mason about the past, present, and future of data science.
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

59 min

code-along

Low-Code Data Science and Analytics with KNIME

Emilio from the analytics platform company, KNIME, will guide you through the main functionalities of the software and you will build together a first visual workflow to answer some questions with your data.
Emilio Silvestri's photo

Emilio Silvestri

See MoreSee More