50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

Richie and Don explore the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more.

22 abr 2024

Guest

Don Chamberlin

Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering.

Host

Richie Cotton

Key Quotes

The database revolution has just been unfolding rapidly over the last half century, and I was really privileged to take a part in it. SQL didn't cause this revolution, it was caused by economics. It's been a wild ride and I'm very grateful for the opportunities that have come my way over the years.

SQL has been more successful than we ever dreamed it could be. But I'd have to say that the language met the goals that we had defined for ourselves only in part. Remember, Ray and I thought that SQL would be used by what we call casual users who were not computer programmers. Well, it turned out we were wrong about that. The actual users of SQL turned out to be mostly programmers building database applications. Well, I think SQL has made the work of these programmers easier and more productive, so that's a good thing. The casual users, I think, are still out there, but they're not using SQL. They're using Google and increasingly, I think they're starting to use AI systems like chat GPT.

Key Takeaways

The development of SQL was driven by the need for a non-procedural, readable, and accessible way to query databases. Data professionals should focus on designing systems and languages that simplify complex processes, making data accessible to a broader range of users.

The success stories of MySQL, PostgreSQL, and SQLite highlight the impact of open-source resources on learning and practical implementation in the data space. Utilizing and contributing to open source can accelerate development and foster a community of learning.

The development of SQL++ and the ongoing evolution of database technologies suggest that continuous innovation is crucial to meet the changing demands and opportunities presented by new data-intensive applications.

Links From The Show

The first-ever journal paper on SQL. SEQUEL: A Structured English Query Language

Don’s Book: SQL++ for SQL Users: A Tutorial

System R: Relational approach to database management

SQL Courses

SQL Articles, Tutorials and Code-Alongs

Transcript

Richie Cotton: Welcome to DataFramed. This is Richie. We're celebrating two anniversaries today. First of all, it's episode 200 of Data Framed. Thank you for listening. I'm really proud to be your host and I hope I've helped you learn some useful things about data and AI. Secondly, it's 50 years since Don Chamberlain and Ray Boyce published the first ever paper on the SQL programming language.

I'm sure I don't need to tell you that this changed the world. Even now, every data practitioner needs some SQL skills because that's how we access our data. We only bring you the best on this show, and today's guest is Don Chamberlain himself. He's here to talk about the early years of SQL, how the language went from an idea about easier ways by accessing data to world domination.

We spend a lot of time on this show thinking about the future, but I think it's also important to occasionally step backwards and appreciate how far we've come. Let's hear Don's story.

Hi, Don. Welcome to the show.

Don Chamberlin: Hi, Richie. Thank you. Pleasure to be here.

Richie Cotton: I'd love to start by just finding out how you first became interested in databases.

Don Chamberlin: Well, I'll start from the beginning. In 1970, I was finishing my graduate studies at Stanford, and I took my first professional job with IBM at the Watson Research Center in Yorktown Heights, New York. I moved from California to New York in the winter, which was not a move I'd recommend if you enjoy warm weather... See more

few months later, my friend Ray Boyce also completed his graduate work at Purdue, and he joined IBM at the same location where I was. Well, Yorktown was the central research facility of IBM, and the mission of IBM Research is to study technologies that might influence IBM's future products. And in 1970 there was kind of a revolution going on.

The cost of computing was coming down very quickly, and lots of companies were putting data online for the first time. And this seemed like a business opportunity. So, The group that Ray and I were in was assigned to study the state of the art in database management with an eye to influencing IBM's future products.

Richie Cotton: Okay. And what was this first database that you got interested in?

Don Chamberlin: Well, we studied something called the DBTG report. So let me tell you where that came from and why we were interested in it. In the early 1970s, The most respected person in the database industry was a guy named Charles Bachman of General Electric, known to his friends as Charlie. And Charlie had actually invented the concept of a database management system.

He was the first one to call for a separate software layer to manage data that was shared by multiple applications. And this was a pretty important concept for inventing the database management system, basically. Charlie received the ACM Turing Award, which is the most prestigious award in computer science.

Well, Charlie had actually built a database management system that was called IDS, stood for Integrated Data Store. And IDS stored information in the form of records and records. You could think of them as pointers. IDS, a program could navigate through what Charlie called data space, moving from one record to another by following these pointers to find an answer to a question.

And in fact, when Charlie gave his Turing Award lecture, he titled it The Programmer as Damagator. Well, one of the most popular business programming languages at the time was COBOL. And there was a movement to add database management functions to COBOL. And a committee was formed for this purpose, called the Database Task Group, which abbreviated DBTG.

Charlie was a member of DBTG. And the group published a report in 1971, defining a set of commands for navigating in data space based on Charlie's ideas. Well, the DBTG report was, pretty important at the time. Ray and I spent some time studying it. It was wonderfully complicated. It had currency indicators, set of current selection rules.

It had a find command with seven different versions. It could do a pretty good job of answering questions that were anticipated in the database design. unanticipated questions, sometimes you were out of luck. Ray and I wrote a review of the DBTG report. And we suggested some incremental improvements.

We thought if we could manage to understand something as complicated as DBTG, our careers would be off to a good start.

Richie Cotton: That's funny. I love the analogy there of working with data as being navigating. I think this phrase is not often used, but you do spend so much time trying to work out how different bits of data are connected together. And It sounds like this idea of data being connected is leading towards the idea of relationships between data and relational databases.

So can you talk me through how relational databases came about and how you got interested in them?

Don Chamberlin: Well, sure. All this came about because of a paper written by Ted Kopp. Ted was a scientist at IBM's Research Laboratory in San Jose, California. And in June of 1970, he published what became a very famous paper called A Relational Model of Data for Large Shared Data Banks. The basic point of Ted's paper was that Charlie Bachman had gotten it all wrong, and that navigating through data space was a bad idea.

Ted thought that database queries should not look like programs that tell the computer what to do. He wanted to express queries in a high level, non procedural language. He liked to say, tell me what you want, not how to find it. Well, I read Ted's paper as part of my learning process and getting up to speed on the state of the art in databases.

And on first reading, I wasn't much impressed with this paper. Ted was basically a mathematician and his paper contained a lot of mathematical jargon. It defined a relation as a subset of the Cartesian product of a set of domains. And it introduced concepts like data independence and normalization and operators like permutation and projection and join.

And my impression of all this was that Codd's paper It was interesting from a theoretical point of view, but I couldn't see that it was really grounded in practical engineering. But I kept on hearing more about this relational data model. I heard about a symposium that was going to be held in Miami Beach in December of 1972, and it was going to feature a tutorial on relational databases.

Well, traveling to Miami in the winter had a certain appeal, so got permission to attend this symposium. It was called the Coins72 Symposium, Conference on Information Systems. And I actually met Ted Codd for the first time on the beach at Fountain Blue Hotel. I attended this tutorial, which was taught by Chris Days, and I have to describe it as a conversion experience.

For the first time, I began to understand the simplicity and power and elegance of TED's relational approach. Queries that took a whole code in DBTG could often be expressed in a single line in a relational approach. So when I returned to New York, wasn't interested in DBTG anymore. I was taking up a new interest in relational query languages.

Richie Cotton: That's fascinating. And I love that even though it was a hugely influential paper, Your first reaction to Ted's work was like, Oh, it's just mathematical nonsense, all theoretical, no practical applications. But once you sort of see it in action, it is actually incredibly powerful. So I love how it just translates into something more practical and more real.

I think the first sort of real implementation of this was the System R project that you worked on. Is that right?

Don Chamberlin: Well, there were actually several things going on more or less at the same time in different places in IBM and also at other different companies. There was the ingress project, for example, at Berkeley. But I'm gonna talk mainly about system R because that was the project that I was associated with.

There were a lot of people who saw the power and simplicity of K'S approach. But the whole idea depended on a high level query language with an optimizing compiler that could turn it into efficient code. And the question was, that sounds like a good idea, but was it just science fiction, or was it really?

Ready for prime time. And in 1973, IBM decided to answer this question by building an industrial strength relational system, just to prove it could be done. And this was done at IBM research. They created a project for this purpose and called it System R. Well, System R was located in San Jose because that's where Ted Codd was.

And there were about 14 people, including Ray and myself, who were gathered from several IBM sites all over the country to come together and work on this System R project. Well, I wasn't very happy about moving to New York at that time. I'm sorry, about moving from New York to San Jose at that time. had just bought a house, and my wife had a good job teaching high school.

I felt disrupted, but I made the move and I went to San Jose to join the System R project, and that turned out to be the best decision I ever made. Working on IBM's first relational database system really turned out to be the opportunity of a lifetime. At

Richie Cotton: It's interesting how these cross country moves have been problematic each time. But, yeah,

Don Chamberlin: IBM, I used to say, I've been moved. Well,

Richie Cotton: And then from System R, it seems like this is leaning towards the really important paper that you wrote around the SQL programming language. So, where did that idea come from?

Don Chamberlin: to tell you the truth, Ray and I liked some parts of Codd's ideas better than others. We really liked this idea of a non procedural query with the slogan, tell me what you want, not how to find it. What we didn't like was the mathematical jargon in Ted's papers. We wanted to design a language for a new class of user.

We call them casual users. We thought a casual user is a professional who needs access to data, but he doesn't want to be a computer programmer, and he doesn't even want to rely on a computer programmer. He might be an urban planner, or a financial analyst, or an insurance company executive, and he might have questions that vary from day to day.

And he might want his results pretty quickly. Well, the database systems of the 1970s just didn't meet these requirements. So, to serve this casual user, Ray and I wanted to design a new language, and we set certain goals for it. Number one, we wanted to use the term tables, a set of relations. Everybody knows what a table is.

Number two, we wanted to base the language on ordinary English words like select. And Goal number three the language should have no special symbols and it should be easy to type on a keyboard. And goal number four, which is maybe the most challenging one, we wanted it to have something that we called the walk up and read property.

Meaning, in simple cases, a user with no special training should be able to understand a query just by reading it. Well, those were the goals that we set for ourselves. And we, we called this New Language SQL, which was an acronym for Structured English Query Language.

Richie Cotton: It's amazing how the things you were worrying about back then, 50 years ago, things that we're still worrying about now. So, for example, there's a big push at the moment to make data more accessible to everyone, regardless of whether you have a technical background or not. And I find it fascinating that this is something you worried about when you were first designing the SQL language, that people who didn't have this strong mathematical background could still make use of the technology.

So, You mentioned the idea of walk up and read. So, people just walk up, look at the code and it makes sense to them. It sounds like a difficult thing to measure. how do you know if you've been successful at that?

Don Chamberlin: That's a good question. It's a hard thing to do, and it's a hard thing to measure. And I'll never know really how successful we were, but we had a psychologist on the staff named Phyllis Reisner. And Phyllis conducted an experiment at San Jose State University teaching SQL to college students who had no programming experience at all, and recording their progress and the kinds of errors that they made.

It turned out that these college students could become proficient in SQL after a few hours of instruction. Their most common error was something funny. They would forget to put quotes around strings. for example, if a query contained the phrase name equals Fred, You had to put quotes around Fred to indicate that it's a constant string, somebody's name, rather than the name of a column.

Well, that's an important distinction, but a lot of students never understood it and intended not to put quotes around anything.

Richie Cotton: I can confirm that that is still a problem in every programming language 50 years later, forgetting to quote your strings and putting bits of syntax in the wrong place. So after you've sort of designed this language, I think it was initially used just within IBM. How did it travel outside that organization?

Don Chamberlin: Well, Ray and I published the first SQL paper at a conference called SIGFDET in Ann Arbor, Michigan in June of 1974. SIGFEDEC has since changed its name to SIGMOD, the Special Interest Group on Management of Data. And it's now probably the most prestigious annual database conference. This conference in 1974 was very interesting because it featured a panel discussion between Ted Cudd and Charlie Bachman.

This was called a panel discussion, but everybody knew it was a debate. And in my view, Ted Codd was the winner of this debate. I think after this conference in 1974, Ted's relational approach was considered to be the new mainstream in database management. So that's why I consider that this year, 1974, starts the clock on what I've called 50 years of relational databases.

And since the first SQL paper appeared in this conference, it also starts the clock on 50 years of SQL.

Richie Cotton: I have a question for you on this because the, in this paper on SQL, it's spelled S E Q U E L, now been shortened. Even today, there's a lot of confusion about, do I call it SQL? Do I call it sequel? I'd really love to have an official answer on this. Which do you prefer?

Don Chamberlin: Well, at some point after publishing our paper, we got a letter from somebody's lawyer that said we couldn't use the name SQL anymore. It was somebody's registered trademark. So we had to officially shorten the name to SQL, which stood for Structured Query Language. So the official name of the language is now SQL, well, but SQL is a lot easier to say than SQL, so I usually just pronounce the name SQL and hope I won't get in any trouble for doing that.

Richie Cotton: All right, we have an official answer there. That's pretty. I like that both ways are possible. One's better for writing and one's better for speaking. So, you were talking about the sequel paper. Can you tell me what happened once the paper was published?

Don Chamberlin: Actually, the, the next thing that happened was a tragedy. That SQL paper was the last thing that Ray and Royce and I did together. Less than a month after the SIGFREDET conference my friend Ray died suddenly and unexpectedly of a brain aneurysm.

Richie Cotton: It's a very sad event, and even 50 years later, it feels like a real tragedy. So, I'm sure it must have been a shock to you. Could you maybe tell me a bit about what it was like working with Ray?

Don Chamberlin: Yeah, Ray was my best friend. We moved from New York to California together. We lived near each other, we carpooled to work. I drove Ray to work at IBM on the day he had his aneurysm attack and he was taken away in an ambulance. Ray and I used to play something we called the query game. We were experimenting with different query language designs.

We'd take turns dreaming up queries and challenging each other to express them. We explored a lot of ideas in those days, and at the end of the day, we couldn't remember which one of us was responsible for any given idea. with Ray was the best part of my job.

Richie Cotton: That's wonderful that you have fond memories of working with him. And again, yeah, I'm sorry, I'm sorry that such a tragedy happened to him and to your friend. I'm wondering so after you had this sort of, Revolutionary paper things are starting to get popular, or who were the first people that took on this idea?

Who was using SQL to begin with?

Don Chamberlin: Well, you have to remember that System R was a research prototype. It was not an IBM product. So you couldn't just go somewhere and buy it. As a research group, we wanted to gain some visibility inside IBM. To do that, we needed to have some users. So we distributed System R to about a dozen internal IBM sites.

And also on a joint study basis to three frontline IBM customers that was Boeing and Pratt Whitney and Upjohn. And we had quarterly meetings with all of our users to learn about their experiences and respond to their suggestions. It was during this period that we had to shorten the the name to SQL.

Richie Cotton: Okay, so it was a lawsuit that intervened there. Those darn lawyers. Oh, Alright, so, to begin with, it was a research project. And then, when was SQL first commercialized?

Don Chamberlin: Well, since SQL was invented at IBM research, you might expect that IBM would be the first to bring it to market, but that's not actually the way it turned out. Interestingly enough, iBM in those days had another database product called IMS and they weren't in a hurry to introduce a competitor to their successful product.

But they did allow the SystemR group to publish their results in the open technical literature. That was generous and that's how we published the SQL paper and lots of other papers about the details of the SystemR work. Well, there was a small startup company called Relational Software Incorporated, abbreviated RSI, that took an interest in these papers.

The founders of RSI guessed correctly that IBM would eventually release a SQL product on mainframe computers, and they saw an opportunity there. They decided to build a product that was compatible with SQL on less expensive hardware platforms and to bring it to the market quickly. So, And they executed this plan very successfully.

In fact, in 1979. They released a SQL product called Oracle running on a minicomputer, a PDP 11. And this product was immediately successful, so much so that RSI changed its name to the Oracle Company. And Oracle was actually the first commercial implementation of SQL.

Richie Cotton: That's fascinating, because I've not heard of RSI, but obviously Oracle is a huge brand name, so I hadn't really realized about the name change.

Don Chamberlin: IBM itself didn't release a SQL product until 1981 on some of its smaller computers. That was two years after Oracle, and their strategic mainframe product called DB2 came out in 1983. That was four years after Oracle. And by this time, well, Oracle had pretty much established a commanding lead in the database market.

Richie Cotton: so it seems like that was the main competition then, was between Oracle and IBM in the early days. Were there any other players?

Don Chamberlin: Yeah, there were I've been talking about System R, a research project to prove the concept of a commercial relational system. But there was another project very much like that, also going on at the same time, at UC Berkeley. Their project was called Ingress, and it was led by two professors, Mike Stonebraker and Gene Locke.

Well, Ingress had its own high level query language called QWEL. And, much like System R, Ingress was distributed for free to experimental users, which were mainly universities. And it became widely used as a teaching tool at universities. Ingress spun off a commercial company, also called Ingress in 1980.

And in the early 1980s, ingress and Oracle were the market leaders in relational databases. They ran neck and neck. They both ran on deck VX computers. And ingress implemented the quill language and Oracle implemented sql. And the QWEL language was well liked by its users but I think I'd give the edge to the Oracle marketing divisions.

They marketed their SQL product very aggressively. And in 1984, ingress decided that they had to begin supporting sql in order to compete with Oracle.

Richie Cotton: Okay, so I hadn't realized that there were all these sort of alternate languages then for accessing relational databases. But it seems within a few years, things have become standardized because you had the advent of standard for the SQL language. Can you tell me how that came about and what your involvement was in this?

Don Chamberlin: I think that's an interesting story. The American National Standards Institute, ansi created a, a committee in the late 1970s to define as standard database language. They kept changing the name of this committee, but usually had H two in its name somewhere. So I'm gonna call it the H two committee at first.

This standard was supposed to be based on DBTG, but in 1982, they decided to extend the mission to define a relational standard also. They wound up with two different standards, one based on DBTG and also a relational one. And when they got into the relational business the two companies that were in the marketplace for relational systems were Oracle and Ingress, and they were both marketing SQL.

And so the H2 committee decided that they would base their relational standard on some version of SQL. And they went ahead and created a standard, which became an ANSI standard and also an international standard with ISO. These named Database Language SQL, S Q L, and and they were released in 1986.

So that was going on in ANSI and ISO, which were voluntary associations of commercial entities, companies, but actually the standards work that had the most impact in my opinion was something that was going on somewhere else. It was at the National Institute of Standards and Technology NIST, sometimes pronounced NIST, and unlike ANSI NIST is actually a branch of the federal government.

And in 1992, NIST created something called a Federal Information Processing Standard. This one was called FIPS 127, that happened to be identical to the ANSI SQL standard. And even more important, they provided a test suite and a validation service for conformance to this standard. And companies whose database product passed the validation test received a license to sell their products to the federal government.

Well, several companies did this and this gave a big boost to the commercial presence of the SQL language because you could sell it to the government. Well, the SQL standard has evolved a lot over the last 50 years. It started off pretty simple and it just kept growing. A lot of new features have been added, date and time data types, outer joins, recursive queries, the list goes on and on.

new revision has come out about every five years. The latest one came out in 2023. I think the standardization product has had several good effects on the industry. Number one, it gave customers confidence that they had multiple sources where they could buy their database software. Number two, it gave vendors a way to evolve their products while maintaining compatibility with each other.

And number three, it brought some really smart people together to evaluate requirements and make proposals. This H2 SQL standards committee has been I'm meeting on a regular basis for a long time now, many years.

Richie Cotton: I love that the fact that the language became standardized helped increase adoption because it gives people trust that this is an official thing and that you know what you're getting. So that just seemed like a pretty important milestone. I'm wondering, are there any other important milestones in the early history of SQL that you think are important?

Don Chamberlin: Sure before we leave the standards subject, I want to give a disclaimer here during the decade of the 80s when a lot of this standards work was being done, I actually took a leave from the database world and got involved in desktop publishing. That seemed to me to be the exciting thing that was happening in the 1980s.

But IBM finally decided not to go into that business, so, so I returned to the database world around 1990. But by that time, a lot of the standards work had already been done. So, the credit for that belongs to other people. Well, during the 1980s, the revolution in data management really hit full stride.

The cost of computing and storage Kept on coming down, the volume of data generated by businesses just expanded enormously. Almost every business system, almost every business needed to acquire a system to manage their data. Oracle, of course, continued to prosper, but lots of other new relational products entered the market.

There was Db2, and Informix, and Sybase, and Tandem, and Microsoft SQL Server. They all offered implementations of the SQL language. Seemed to be room in the market for everybody. In fact, so many products were claiming to be relational that, in 1985, Ted Codd published series of 12 rules that define an authentic relational database.

And you can find these rules in Wikipedia. Just search for Codd's 12 rules. Starting in the 1990s, there were some truly game changing developments. Three, two, one. very high quality, open source SQL implementations became available. Their names were MySQL and PostgreSQL and SQLite. And all three of these were fully featured, reliable, high performance systems with large user communities.

They all had free versions, and also they had additional services that you could buy for a fee. Well, web based applications were proliferating in the 1990s. That was the dot com days, then. And many of these apps use one of these open source systems for data management. SQLite in particular is, is interesting because it's embedded invisibly all over the place.

It's in most smartphones and browsers and, and many popular applications. So these three open source SQL systems are now among the most widely used database systems in the world.

Richie Cotton: Absolutely they are incredibly popular, all three of them. And these are things that we still teach now on data camp. If you want to learn to use databases, use Postgres or one of these other ones. So, yeah it's had a huge impact and actually, Even at 50 years old, SQL is still one of the most popular programming languages.

So, on things like the TIOBE index, the IEE spectrum index, like most popular programming languages, SQL is just, it's always in the top 10. So I'm wondering how do you account for its longevity?

Don Chamberlin: I can think of several reasons for that. The first and most important reason is, TEDCOG got it right. The relational model is simple and powerful and flexible and elegant, and really that made everything else possible. But second, I think it helped a lot that the early research by both SystemR and the Ingress project at Cal were published openly.

So there were basically no impediments to commercialization of this technology. That research was given away for free. Third, I think the ANSI standard provided a well defined language specification and a way for the language to evolve, to meet new requirements. And that kept it alive and well, as new requirements came along over, a period of decades.

And fourth, and this is really very important, are those high quality open source SQL implementations available for free? Well, what's not to like about that?

Richie Cotton: Freestyle is always good, for sure. Um, So, um, Looking back on this, do you think that SQL has lived up to what you and Ray envisaged for it back in 1974?

Don Chamberlin: Well, that's a good question. All right. In some ways SQL has been more successful than we ever dreamed it could be. But I'd have to say that the language met the goals that we had defined for ourselves only in part. Remember, Ray and I thought we would, that SQL would be used by what we called casual users who were not computer programmers.

Well, it turned out we were wrong about that. The actual users of SQL turned out to be mostly programmers building database applications. Well, I think SQL has made the work of these programmers easier and more productive, so that's a good thing. The casual users, I think, are still out there, but they're not using SQL.

They're using Google and, and increasingly I think they're starting to use AI systems like ChatGPT.

Richie Cotton: Absolutely. That's been a huge change in the last year. It's just people can have their SQL code written for them quite easily. And so it's made it even more accessible. So, yeah, maybe that'll bring SQL to even more people. All right. Is there anything in the world of databases that you are currently excited about?

Don Chamberlin: I'm retired now, but I keep hearing the term NoSQL a lot. The NoSQL movement, I think, is inspired by web applications that need massively scalable databases. And That's an important requirement to get the scalability. These systems usually relax one or more of the constraints of traditional relational databases.

So here's some examples of that. Number one, relational databases usually have rigid schemas that say exactly what the tables look like and that are in that system. NoSQL systems sometimes relax this requirement. They might have a pressure schema or maybe no schema at all, so they're more flexible.

Number two, relational databases. they're limited to the relational data model. They have, they're made out of flat homogeneous tables. In each table, all the rows look the same. Well, NoSQL systems sometimes relax that requirement and have a different data model. Some of them are just really simple, like key value stores.

Others might allow tables to be nested, or they might be based on some document format like XML or JSON. They're all over the place. Well third relational systems usually offer some transactional guarantees, like the well known ACID properties that keep data in a consistent state. NoSQL systems sometimes relax these guarantees a little bit.

They'll often replicate data across many nodes, and they might tolerate what they call eventual consistency, meaning, well, we'll be, we'll be patient. We'll allow the nodes a little while to catch up. So NoSQL is is an exciting new direction. I think it's a broad name for several promising new directions in, in database research.

And, and that's a good thing. But sometimes I think scalability is what we want. But scalability isn't necessarily incompatible with a high level language. So I've been hearing about a new language development called SQL That's a clever name, I think. SQL is a backward compatible extension of SQL that originated at UC San Diego by a professor named Ioannis Papakonstantinou, and SQL has been implemented.

It's available in open source form from the Asterix Data Project at Irvine, led by Professor Mike Carey. So you can get it in GitHub. And also, there are some commercial versions of SQL coming out. They're being marketed by Couchbase and by Amazon Web Services. The Amazon version goes by a different name, Particle, but it's basically SQL SQL it's one of those schema optional languages.

And for data model, it operates on JSON, collections of JSON documents, which you can also view as nested tables. The correspondence between the JSON document and a nest of tables is the thing that makes SQL compatible with earlier versions of SQL that operated on tables. So, if you're interested in this, you can get more information by just Googling SQL

Richie Cotton: that's interesting, because the SQL language has been, I mean, there have been some updates, but not that many updates to it. And so having a new language that's sort of similar to SQL and backwards compatible does seem like a pretty good innovation. All right. So, just to wrap up, do you have any final advice for fans of SQL?

Don Chamberlin: When I look back over my own career I think of it as a case of being in the right place at the right time. The database revolution has just been unfolding rapidly over the last half century and I was really privileged to take a part in it. didn't cause this revolution. It was caused by economics.

Hardware was getting faster and cheaper at a, exponential rate. And these advances in hardware made three things possible. The first was a clean, elegant data model, like Codd's relational model. And the second thing was a high level, non procedural language, which turned out to be SQL. And the third thing is the optimizing compiler that brought these things together and made them commercially viable.

Well, these three items, The data model, the query language, and the optimizing compiler, all support each other, sort of like a three legged stool. And that's what's made today's database systems possible, I think. So to wrap up, in my career, I've had some lucky breaks. I've been privileged to work with some brilliant people.

Ted Codd, Ray Boyce, Jim Gray, Pat Selinger, the whole Systemire team, Max Stonebraker. I'm in debt to all of these people. It's, it's been a wild ride and I'm very grateful for the opportunities that have come my way over the years.

Richie Cotton: Wonderful. I mean, it's such a fascinating story and your achievement to just being used by so many millions of people. So it's a very impressive stuff. All right. Thank you for joining me on the show, Don.

Don Chamberlin: Oh, thank you, Rishi. It's been a pleasure talking to you.

Temas

Data Science

Data Engineering

SQL

Relacionado

blog

What is SQL? The Essential Language for Database Management

Learn all about SQL and why it is the go-to query language for relational database management.

Summer Worsley

13 min

blog

What can you do with SQL?

Ever wondered what you can do when you master SQL? We explore five potential uses for this versatile programming language. Start your journey to mastery now.

DataCamp Team

3 min

podcast

The Database is the Operating System with Mike Stonebraker, CTO & Co-Founder At DBOS

Richie and Mike explore the success of PostgreSQL, the evolution of SQL databases, the impact of disaggregated storage, software and serverless trends, the role of databases in facilitating new data and AI trends, DBOS and it’s advantages for security, and much more.

podcast

The Past and Present of Data Science

Sergey Fogelson deep dives into how data science has evolved over the past decade, the largest challenges facing data teams today, and the importance of learning SQL and data democratization.

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.

Kelsey McNeillie

code-along

Introduction to DuckDB SQL

Mehdi, a data engineer and developer advocate, introduces you to DuckDB SQL. You'll learn about the use cases, the changes to SQL syntax to make data analysis easier (no more SELECT *!), and how to include DuckDB in your analytics workflow.

Mehdi Ouazza

Ver más Ver más