The Database is the Operating System with Mike Stonebraker, CTO & Co-Founder At DBOS
Mike Stonebraker is a distinguished computer scientist known for his foundational work in database systems. His extensive career includes significant contributions through academic prototypes and commercial startups, leading to the creation of several pivotal relational database companies such as Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica, and VoltDB. Stonebraker's role as chief technical officer at Informix and his influential research earned him the prestigious 2014 Turing Award.
Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.
Key Quotes
One of the environments that we have DBOS running on is the MIT SuperCloud, which is in Holyoak, Massachusetts. It has 32,000 processors. It has several terabytes. That's for the T of main memory and many terabytes of secondary storage. So the resources that the operating system has to manage has gone up by about six orders of magnitude in the last 40-ish years. So without me saying another word, that makes managing operating system state, which is keeping track of tasks, resources, processes, files, all that stuff, that makes keeping track of operating system state, that makes it a database problem without me saying another word.
There's a much smaller attack surface on the with the databoase in the OS compared to traditional worlds. So there's a lot less gates to close. And the current world is you start with Linux, a leaky boat, and you paper over that with a bunch of stuff. And in our opinion, that gets you another leaky boat. And so the way to make security better is to have a much simpler system with a lot less attack surface. And that's exactly what we do. And so there's much less attack service. You can do much, much better monitoring and you can get up from ransomware attacks easily. So that's our main security story.
Key Takeaways
Utilize cloud migration not just for a simple "lift and shift" but as a chance to fundamentally rework and improve your data management systems. This approach can help correct longstanding inefficiencies and prepare your systems for future demands.
The concept of integrating database functionality directly into operating systems, as seen with the DBOS project, suggests new possibilities for enhancing system efficiency and security.
With the increasing complexity of cyber threats, designing systems with built-in security features, such as those capable of rapid recovery from ransomware attacks, is crucial. This approach not only enhances security but also reduces potential downtime and data loss.
Transcript
Richie Cotton: Welcome to DataFramed. This is Richie. You don't need to be a data scientist to know that databases are everywhere. Well, almost. While the majority of the world's data is stored in a database, there's one final frontier that databases haven't yet conquered. That's the heart of the operating system. At the moment, databases are applications that sit on top of operating systems like Windows, Mac OS, and Linux.
However, today's guest had an idea to invert this. He asked, what if operating systems were built on top of a database? That guest is none other than Mike Stonebraker, the inventor of five database systems, including Ingres, one of the first relational database systems, and Postgres, one of the world's most popular databases.
That is to say, he's been working with databases since their very beginning, 50 years ago, and he's one of the most influential figures in the field. Since 2001, he's been a lecturer at MIT, and his fundamental work on databases has also given him the highest honors. In 2005, he was awarded the IEEE, John Von Newman medal, and in 2014 he won the A CM Touring Award.
On top of this, he's been the chief technology officer for seven firms, and at the age of 80, he's just formed another startup, deboss. The company shares a name with its main product, the database oriented operating system that we are here to discuss. Let's find out why you might want an operating system built on a database and how it can help your business. Hi, Mike. Thank you for joining me on the show.... See more
Mike Stonebraker: Oh, thanks for having me, Richie. Wonderful. So, you're most famous for your work on the PostgreSQL database. So, can you tell me what you think led to its phenomenal success?
So first of all, it's phenomenal success had almost nothing to do with me. it was picked up in 1995 by a pickup team of programmers who have promoted it, shepherded it ever since. And so they, they deserve most of the credit. And you say, why, why is it taking over the world? Well, it's a much better database system than MySQL.
And hopefully, you know, cream rises to the top. But also people were kind of afraid when Oracle bought MySQL. And Postgres has remained purely community driven all this time. And I think, a perfect example of what open source is supposed to be. I'm delighted that elephants, the cloud elephants are pretty much standardizing on, Postgres wire protocol.
And so I think will become a very, very dominant database system. Well, the interface, there will be lots of implementations.
Richie Cotton: Yeah, I love that. It is really, it's a community that's grown and created something amazing and it's still continuing to evolve sort of decades after its introduction. So, is there anything that you're most excited about in the world of SQL databases?
Mike Stonebraker: Andy Pavlov and I wrote a paper. Well, I wrote a paper in, 2007, saying, what goes around comes around, and there aren't any new data models. Andy Pavlo and I wrote a paper that's going to appear in Sigma Record that's 15 years later. And here's a summary of what the paper says.
there aren't any new data models that are going to get fraction, in our opinion. And all, the interesting ideas are in either hardware stuff or in new applications. And I think, for example, everyone is moving everything they can to the cloud as quickly as they can. And that's for all kinds of good reasons.
And so, it seems to me the most exciting thing that I see is that if you're doing one of these cloud migrations, you have a once in a generation opportunity to try and fix the sins of your predecessor. And so you can either do a lift and shift, at which point your successor will inherit the sins of your predecessor.
Or you can refactor, rewrite, intelligently move to the cloud. And I think that's, that's the most exciting macro trend. Other things are when you move to the cloud, the cloud basically forces you to have disaggregated storage. And that's forcing all the database vendors to completely rewrite their stuff.
And the reason for that is that Networking has gotten the will a lot faster than it used to be. And that enables you to do disaggregated storage. The second thing is big cloud vendors make it financially very attractive to have software as a service, function as a service, serverless computing.
And that will encourage all application writers to rewrite their stuff, also will encourage database systems to adopt that model. So there's lots of changes driven by the cloud guys. In terms of new applications, I think how is machine learning large language models going to be supported by database systems?
where is that going to go? I think it's an exciting thing to watch. Think genomic databases. are going to become much more prevalent and how are those going to be supported? and then I think the topic that we're supposed to talk about is can the operating system really become a database system?
That's basically a new application area for databases. So I think new application areas are fascinating. And I think the cloud slash hardware changes are really fascinating. I'm not expecting anything to happen in data models. I think the relational model is the answer, and I don't see that changing.
Richie Cotton: lots to unpack there. I like your first point about how you always inherit the sensing predecessor. Like, I think this is familiar to anyone who's worked with data, that there's some sort of horrendous blob of data there that you're like, well, yeah, I'm not sure I want to touch that.
So, the idea that you've got to maintain things and improve them as, as you go along avoiding the technical debt, that seems very important. And then on your last point about databases in the operating system, it does seem like databases are pretty much everywhere, and the operating system is one of the sort of last holdouts.
So since your new project is around creating an operating system with a database at its core, can you tell me why do operating systems need databases?
Mike Stonebraker: So, have lots of gray hair, so I was a very early user of Unix in 1974. On a PDP 1140 one processor 48 K of main memory, not. not M or G, K and 20 megabytes of secondary storage. So one of the environments that we have DevOps running on is the MIT Super Cloud which is in Holyoke, Massachusetts.
It has 32, 000 processors. It has several terabytes, that's with a T, of main memory. And many terabytes of secondary storage. So the resources that the operating system has to manage has gone up by about six orders of magnitude in the last 40 ish years. So without me saying another word, that makes managing operating system state, which is keeping track of tasks, resources, processes, files, all that stuff.
That makes keeping operating system state, that makes it a database problem without me saying another word. so you want to apply database technology to what's a database problem. so that's number one. Number two, Linux is now 40 ish years old. Unix is 50 ish years old. And That makes them legacy software that's been maintained, patched, extended over a long period of time.
And the Linux community is having a very hard time making forward progress. So, for example, there is no multi, multi node version of Linux. so everybody who is running a multi node system, which includes most everybody has to run multi node orchestrator, something like Kubernetes.
Also Linux is well known to be a leaky security boat. So people layer all kinds of security stuff on top of it. And so what you have is a patchwork of operating system software that is a management nightmare and is still a leaky security boat. so. The fact that Linux is legacy means that it's time to send it to the home for tired software.
So scale and, legacy are the reasons to start with a clean slate and start anew.
Richie Cotton: Okay, that multi node idea in particular sounds interesting. So, I know, like, earlier you mentioned that, This is a solved problem with databases as of now, because databases are in the cloud, they have to work on multiple servers at once. And so, I can see, I think I can see where this is going in terms of having the operating system have that capability as well. I would love to remember that this has been tried before. So maybe 20 years or so ago, Microsoft tried to put a database at the heart of the operating system with WinFS, and then they abandoned that project. So what's different this time?
Mike Stonebraker: So the project, which was called Longhorn internally, sort of sleuthed around a bunch as to what happened when we started building Dboss. And the general consensus inside Microsoft was that Longhorn was a good idea. Nobody disputed the ideas. And everybody internally said that its problems were bad management and feature creep.
that it got. more and more and more ambitious before it ever worked. And so I think technically it's a very good idea and was then and it suffered from, internal politics, internal management issues and especially feature creep because I've I've been involved with or watched a whole bunch of startups and the worst thing in the world that any startup can do is engage in feature creep.
You want to get something running and once you get it running, then you can worry about extending it. so Microsoft didn't pay attention to that lesson.
Richie Cotton: It's definitely a big problem being tempted to add lots of features because when you're just building something from scratch and there's like so many exciting things you want to add to it, but I can see how that can become dangerous if you've not shipped anything. So one thing she said you were excited about in the world of databases is that there are lots of applications specific databases for different use cases.
And it feels like the same thing is also true of operating systems. There are different operating systems for different use cases. So what is your intended use case for D Bus?
Mike Stonebraker: dBoss started as a academic research project in 2020 jointly with MIT and Stanford. And so we had a running system that we decided to commercialize. And like any startup, The mantra is get a product out as quickly as you can, and then in the vernacular, see if the dogs are going to eat the dog food.
And if they're not they will tell you instantly why, why they don't like it. so our, our goal is to get a product out. As quickly as possible. And we did that. So in a we've shipped a commercial version of D Boss and we had the advantage of having a person named Michael Coden be part of the research project and also part of the commercial venture.
And until recently, he was the managing partner of the cybersecurity practice for Boston Consulting Group. And so through him, we got to talk to lots of enterprise folks big and small in a whole bunch of different areas. And so, here's who saluted to those conversations. First of all, the three letter agencies the, defense industry who is really focused on security.
The second place where people really saluted We're in financial services, dealing with moving money around. And so one thing I learned from interviewing a large regional bank here in the Northeast was they listened, they talked with us and they said, wow. You solve the once and only once problem.
And that was the first I'd ever heard of that, But here's what he actually meant, which is, if I'm going to move 10 from my account to Richie, your account, then we're almost certainly in different systems. the way the transaction should work is you debit my account, send a message to your account, increment your account, and then send a return message, and then commit the transaction.
So this is basically a distributed commit problem. And in the banking world, most of the systems that this regional bank used don't have XA support or any distributed commit support. So the bank was forced to to do this themselves. And so they figure that somewhere between a third and half of their application logic is dealing with once and only once.
And it's brittle, problem prone, hard to get right. Distributed commit is not for the faint of heart. And so they would love to get rid of all that code. And we do it automatically because we run. The network system is in the database. The database is in the database. And so we, control all pieces of that, interaction.
And so we solve the once and only once problem. Automatically. And he was very excited about that. So fin services worries about distributed commit also worries about security. The third place people got very interested was what I'll call scuff shoe enterprises. Which are enterprises that bend metal and do real things.
one particular enterprise that who I won't name described their current security system. They have a hundred thousand endpoints. They're a very big company. conglomerate. And so, they have one particular security vendor who, who does, event extraction off of 100, 000 endpoints.
That's a seven digit a year subscription in the U. S. dollars. And they then send those events through a proprietor, through a, their enterprises, a custom workflow system that enriches the events with, enterprise specific data. They then pass these enriched events to another security product, another seven digit a year subscription.
And they and the vendor have written several hundred monitoring rules. And so, in production, if a monitoring rule fires they have a human analyst look at it to make sure it's not a false positive. And if it's not a false positive, then they take action. The time, elapsed time between when a bad guy knocks at the door and them taking action is measured in, multiple hours.
And they are just terrified of ransomware attacks. And so they have known many of their peers to have succumbed to ransomware attacks. It tends to take all production down. for multiple days at a time and costs a billion dollars if you're a, a big enterprise. So they are terrified about security.
And the thing they love about D Boss is D Boss recovers automatically from ransomware attacks. And the reason we can do that is we have everything in the database, everything. All the operating systems, the state is in the database. And we keep a log of that state. Historically, we spool it into a data warehouse.
And so if you want to back up the operating system 18 minutes, you just do it. And if this is fast enough for the operating system, it's certainly fast enough for your application. So if your application puts all state in the database, then when we back up 18 minutes, we back up everything 18 minutes.
And so if you had a ransomware attack 17 minutes ago, you just back up 18 minutes, single step around the bad guy and let the system go. So they're very excited about Our, our security story, their problem is that they are dragging a huge legacy code base around. And so to take advantage of DeepWise, you would have to restructure, refactor, rewrite a bunch of it.
And that's, a project for the decades. And so that will be a very, very slow market on the uptake. So with that said, What we're aiming for the initial commercial D Boss users are the three letter agencies financial services, specialty startups and adventuresome enterprises who are willing to refactor stuff on the way to the cloud.
So that's, that was a long answer to your short question.
Richie Cotton: Yeah. So I have to say, I was thinking you swap out the operating system is probably going to result in a fairly small productivity boost, but the things you're talking about there, these are pretty dramatic. Like the idea that. Just solving this distributed transaction problem that's going to enable like banks to cut out half of their code base.
That's going to be a huge like productivity boost from the maintenance and then the manufacturing example of just saying, okay, we've got better security. This is going to protect us from ransomware. This is like. Pretty amazing stuff.
Mike Stonebraker: The thing, the thing I found really, really exciting is that most everyone we talk to about the idea thinks it's conceptually fantastic. And so the, getting to an implementation is a small matter of legacy code. So anyway, I'm really excited about the possibilities.
Richie Cotton: This does sound pretty cool. It sounds like the use case of this is going to be around infrastructure for cloud applications, so for your software as a service stuff. And so, can you just talk me through, in general, what does the infrastructure for cloud applications currently look like?
Mike Stonebraker: Well, to start with, if you move to the cloud, then as I said earlier, you're highly encouraged to take a software as a service model. And you get disaggregated storage, stuff like S3. And so, Our point of view is that if the world is moving from on prem to the cloud, we should go to where the market's going to be, not where it was.
And so, right now, DBoss is a cloud only service. It runs software as a service, so that you only are using resources when you are actually running, and if you're idle, you're not using any. and the other thing to note is that transactional databases have gotten wildly faster in the last decade or two.
So it's fast enough to put a database system at the bottom. that's what we do. So we run on the bare metal. Well, we run on a microkernel. And at some point, we may well write our own microkernel so that we're really running on the bare metal. And so Linux is nowhere in sight. Right now, we are running on AWS.
We're running on one of their microkernels called Firecracker. And the database system is the only thing running on top of Firecracker. And on top of the database system, we've written a file system, a messaging system, a bunch of schedulers, and your application runs on top of that stuff.
So just so you, everybody gets really clear what we have in mind, let me tell you how the messaging system works. Thanks. It is not TCP IP at all. It is not a heavyweight thing. And it's all written in SQL. And what do I mean by that? Well, there's a message table with a sender, a receiver, and a payload.
and send a message, you do an insert into that table. That's one line of SQL. We're running on top of a partitioned, multi node, highly available DBMS. So that tuple ends up at home site of the receiver. And so to read a message, you just do a SQL query. Another one line of SQL. So that's the message system.
Once you go outside of our environment, we have a gateway that goes out on to TCPIP and the rest of the world. But inside us, it's all just the database. So everything is just the database. And so the database system is the only thing running besides a small microkernel. And as software as a service, The way software as a service works is you have to structure your application as a graph of workflow steps.
You can call them micro ops operations. They're just pieces of code. So right now we've decided to support We used to run on JavaScript, and we now, and Java, and now we've moved to TypeScript because it seems to be more popular. So you write a collection of operations in TypeScript, and you tell us the graph of those operations.
so that's the way you have to structure your application to get a software service to work. We accept that. That graph and those operations, we store them in the database and we have a tiny orchestrator that wakes up any given operation when its inputs are available and it produces an output.
All of that's in the database. So you just have to write a graph of typescript. And we take care of everything else. So we run it for you. If it seems to be running slowly, we give you more resources and so forth.
Richie Cotton: Okay, so I have to say, I love the idea that The messaging system is just table in a database, so you can write C4 queries against it. I'd love to get into that a little bit more later, but just for now, what are the implications for anyone who wants to develop applications on top of Dbos?
Mike Stonebraker: All you have to do, well, you have to run, we run on the cloud. At some point, We will probably support an on prem deployment, but on prem comes with just a ton of idiosyncratic behavior, on the part of whatever your shop is doing. But we run on AWS. We will run. in the near future on Azure and on GCP.
And so, all you have to do is produce, this graph of TypeScript. You have to be using TypeScript. And I expect in very short order, we will support 10 languages, because TypeScript is certainly not, it's popular, but it's certainly not universal. we'll probably support, Java will probably support JavaScript, will probably support Python.
Will support Go if, if there's enough. I mean, we'll support languages as there's interest in us supporting them. And in short order, we will run on the popular clouds in a variety of programming languages. And what you get, is that every one of your operations is a transaction. one of the big problems people have with software as a service applications is that they're broken up into a whole bunch of steps that are running in parallel.
And so if there's race conditions between the parallel operations, those are fiendishly hard to debug. But in our system, they're transactions and we sort it out with the concurrency control system. So, they're basically few to none race conditions. And so if you know the word Heisenbug it was a term coined by Jim Gray the late Jim Gray.
We avoid almost all Heisenbugs. We also give you a debugger. Because if we can back up the operating system, we can also back up your application. So if you're in debug mode, you simply, and something bad happens, you can simply back up three minutes, a single step forward change the code, change the data.
So we give you a really nifty debugging environment. But we give you transactions for everything. if you want to ask questions about your application, the state of your application is in, the same data warehouse as the operating system stuff. So you can just use SQL to ask questions about what's happening.
You can use SQL to ask for example If you think that I'm possibly a bad actor, you want to know who I've sent messages to, who they've sent messages to, transitive closure of who I've ever talked to. And that's just SQL. So right now, asking questions about what's going on is really, really difficult.
And it's very easy to do monitoring. So, We talked to, we, right now you have to sort of move what amounts to the event log into some proprietary system that talks, a proprietary language. In our world, it all goes into a data warehouse. You can just query it in SQL. So you get. much easier monitoring.
You get fancy debugger. You get super security. get, multi node support. You don't have to run Kubernetes. And you're not running Linux. And so you get a much simpler environment to maintain a lot less moving parts, a lot more security. And you get, a next generation programming environment.
So it's very attractive to application developers. we've talked to a whole bunch of them, and most of them, think, wow, this is really neat. And then they say, yeah, but, You don't support Go or, pick an objection. And so, the real question is we could support all of POSIX, which is sort of yesterday's standard.
I'm really reluctant to do that because most people on the cloud don't care about POSIX at all. They're focused on, workflow standards, stuff like that. So we'd like to be supportive of the standards coming rather than yesterday's news. but the question is how big an application surface do we have to support?
in order to get traction, and we'll find out. We are finding out as we speak.
Richie Cotton: Okay. I think your point about highs and books is really interesting because certainly as a user of web applications, I've often had the experience where something's gone wrong, report the bug, and then the response back as well. I can't reproduce this. Maybe a temporary glitch and then from the engineer side, they're like, well, how do I fix this if if I can't reproduce this?
So the idea is that those categories of temporary problems are going to largely go away because you've got that state. That just seemed like a huge step forward.
Mike Stonebraker: Well, well, database systems have this problem in spades. I mean, I was, I worked for a bunch of different database companies. if you have a Heisenbug, you then the user sends you the bug, and you can't reproduce it. if the customer is important enough, you've put engineers on airplanes and go to the customer's site and put a print statement everywhere in sight.
And so they're fiendishly difficult and fiendishly expensive. And if the is important enough, you're likely to land on the front page of the New York Times. So this is serious business. And so we, we make, we make things a lot better.
Richie Cotton: Going back to what you said about how, because everything's sort of SQL internally, you can start doing queries on it. What sort of queries might I want to run against an operating system?
Mike Stonebraker: who in my environment is using more than 100 gigabytes of space, counting only those files bigger than, one gigabyte? You can't ask that now. And it's just SQL in our system. which three users are chewing up the most resources? And, is there anybody?
Who has copied more than 20 files in the last 12 hours? Just et cetera, et cetera, et cetera.
Richie Cotton: From the application developer point of view can you also do debugging by writing SQL queries?
Mike Stonebraker: yeah, of course. But, but we also give you this, this time travel debugger. But sure, you can, you can write, write SQL against, what amounts to the event log, in the data warehouse. And so, yeah, that works great.
Richie Cotton: Okay so it, it sounds like if you've got a few SQL skills, then it's going to be, like a lot of what's happening within the operating system much more accessible. I'd also like to talk a bit about cyber security. So you mentioned this before, that security is one of the main features, and that you can prevent these ransomware attacks.
Are there any other security benefits from running D Boss as opposed to another operating system?
Mike Stonebraker: Well, there's a much, smaller attack surface. compared to traditional world. so there's a lot less gates to close. And, the current world is you start with Linux, a leaky boat, and you paper over that with a bunch of stuff. And in our opinion, that gets you another leaky boat.
And so the way, the way to make security better is to have a much simpler system with a lot less attack surface. And that's exactly what we do. And so there's much less attack service. You can do much, much better monitoring. And you can get up from ransomware attacks, easily. that's our main security story.
Richie Cotton: Okay, just having a simpler system means less things can go wrong. That seems useful to know. I'd also like to ask you a bit about like what it's like to create a startup because often creating a startup is very much seen as a young person's game. And well, you've been around for a while now. So just tell me like what inspired you to create another startup?
Mike Stonebraker: So my career has been in academia, and Most people in academia want to get famous and write papers and Whether or not they do anything meaningful to the real world is, is irrelevant. And I, somehow early on, I decided that important to try and make change. And I learned a long while ago that the very big companies, don't invest in new stuff.
They, by and large, buy startups after they're somewhat successful. So the way to make, make change happen is to do startups. And so whenever I've had an idea that looked like it was commercializable, to make a difference, you do a startup. And after a while, they get easier and easier for someone like me to do.
And nobody has yet complained about my age which everybody. should complain about since I'm old, but I'm going to keep doing this as long as I can make a difference.
Richie Cotton: That's wonderful. It's very inspiring that, just keep plugging away this and keep coming up with ideas and creating stuff. Alright can you tell me a bit about what you're working on with Dbos right now? What's coming soon?
Mike Stonebraker: Okay, so we've talked about sort of what the commercial guys are thinking about, which is getting the first lighthouse customers, making them happy and, doing whatever they need to be successful. So that's exactly what you would expect. The academic, research on D Bus goes on.
So that hasn't stopped. the thing that I'm most interested in is that if you look at database transactions the high pole in the tent as to what consumes CPU time. It looks like it's pretty much moved to being the networking system. And so to go faster, you've got to redo the networking system.
And so we're looking at all kinds of ways to go fast, to send messages faster. then inserting them into a database table and then reading them back out again. So we're working on, high poles in the current, transactional database stack. that's something that I'm very, very interested in.
And then another thing is, Everybody on the planet is dabbling in large language models, as am I and the question of the day is what can large language models do for structured data in database systems. So I'm plugging away at that. So that's, what I'm focused on, in an academic context.
Richie Cotton: Okay. Faster networking and large language models. It's exciting stuff. Alright, you have any final advice for people who are interested in using Dbos?
Mike Stonebraker: Sure. And, get at it. It, it works, it works and works well and it has a ton of advantages. So, go to our website and kick the tires and you can sign up, to freely use the, cloud version software as a service version. So we're, we're excited to try and get feedback.
And it costs nothing except small amounts of your time. So have at it and tell us what you think.
Richie Cotton: All right, super. Everyone get out if you've got a call to action in the audience. All right. Thank you very much for your time, Mike. That was great.
Mike Stonebraker: Okay, thank you, Richie.
blog
What Is PostgreSQL? How It Works, Use Cases, and Resources
blog
PostgreSQL vs. MySQL: Choosing the Right Database for Your Project
Jake Roach
8 min
podcast
50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL
podcast
Is Big Data Dead? MotherDuck and the Small Data Manifesto with Ryan Boyd Co-Founder at MotherDuck
tutorial
SQL Database Overview Tutorial
DataCamp Team
3 min
tutorial