No More NoSQL? How AI is Changing the Database with Sahir Azam, Chief Product Officer at MongoDB

Richie and Sahir explore the evolution of databases beyond NoSQL, enhancing developer productivity, integrating AI capabilities, modernizing legacy systems, and much more.

Dec 5, 2024

Guest

Sahir Azam

Sahir Azam is the Chief Product Officer at MongoDB. He has been with MongoDB since 2016, where he launched the industry’s first developer data platform, MongoDB Atlas, and scaled the company’s thriving cloud business from the ground up. He also serves on the boards of Temporal and Observe, Inc, a cloud data observability startup. Sahir joined MongoDB from Sumo Logic, where he managed platform, pricing, packaging, and technology partnerships. Before Sumo Logic, he launched VMware's first organically developed SaaS management product and grew their management tools business to $1B+ in revenue. Earlier in his career, Sahir also held technical and sales-focused roles at DynamicOps, BMC Software, and BladeLogic.

Host

Richie Cotton

Key Quotes

The constraint of hardware being the expensive component is no longer the case. It's really more how do you architect systems properly and how do you make developers able to be efficient and productive so that you can ship software at high quality and build those compelling experiences.

There's a lot of concern around how AI will change various job roles in our industry, in the economy, and the world economy. It presents the risk of being disruptive, changing things fundamentally. But I also believe at the same time that at an aggregate level, we're going to be more productive overall as a species. It will ultimately lift people because this productivity and intelligence gets smarter and more capable.

Key Takeaways

Prepare for the integration of generative AI by modernizing legacy systems and ensuring data is organized and accessible, enabling the use of AI to unlock insights from unstructured data.

Consider integrating multiple database functionalities, such as search and time series, into a single platform to reduce complexity and data silos, thereby streamlining operations and improving developer experience.

Embrace the document model in databases like MongoDB to enhance developer productivity by aligning data storage with object-oriented programming, reducing cognitive load and increasing efficiency.

Links From The Show

MongoDB

Course: Introduction to MongoDB in Python

Transcript

Richie Cotton: Welcome to the show.

Sahir Azam: Hey Richie, thanks for having me.

Richie Cotton: Cool. So with MongoDB, I rather strongly associated it with the NoSQL movement, but I noticed that's gone from the branding. So does that mean is NoSQL not a thing anymore?

Sahir Azam: You know, I think many people in our industry, our ecosystem still think of us in the category of kind of NoSQL or non relational kind of database category. I think from our perspective, we consider ourselves quite a bit different than some of the other technologies in that space and so when we describe our technology We tend to use a broader framing of a more modern general purpose database because we think it applies and does apply in our customers environments for a much broader set of use cases than what I think the typical developer or technologist may think a NoSQL database is capable of.

Yeah,

Richie Cotton: Okay. So perhaps we can get into more depth on like what MongoDB is in a moment. But it seems like for a long time, you had SQL databases and NoSQL databases, and now there are lots of different types of databases for pretty much every use case. Can you just give me a quick overview of like, what are the sort of main types of database people need to know about?

Sahir Azam: I think, you know, I would step back and I think the reason why you're seeing a variety of different database types is fundamentally driven by the fact that software and the experiences that developers ar... See more

e creating on behalf of their customers, whether that be an internal piece of software or something that, you powers all of our other business or personalized, it's just more and more advanced and, getting more complicated, more sophisticated in the types of experiences that people are building.

And, the traditional relational sequel database model. Okay. was invented 40 years ago at a time where the majority of software was back office software, you know, maybe for some accountants or for some back office bookkeeping in an organization, and it was really optimized for not only those types of applications, which are very different than the average application today, but also in a world where they were optimizing for back then, hardware and storage were really expensive. Developer time and productivity was sort of an afterthought. But, you know, MongoDB and a lot of these new modern technologies were started in a world in which, you know, we have the benefits of cheap hardware and the price is constantly driving, distributed horizontal compute and all of that.

So that's not really the constraint anymore. And yet everyone is really thinking about how to make their developers more productive so they can spend more of their time building those delightful experiences I mentioned. So The whole technology landscape, the software landscape has changed drastically in those, last few decades, and I think various database technologies have evolved to serve those in a much more efficient way.

Richie Cotton: That's interesting that the infrastructure problems I guess you're saying have basically been solved. So no matter how much data you've got, there's a database that can handle it. And so you have to focus on developer productivity a lot of the time. So, can you talk me through what do you do to make developers more productive when they're working with databases?

Sahir Azam: you know, I think I'm not trying to say by any definition that cost is not a concern for customers or scalability isn't something that, a lot of technology still have to build. You know, and continue to increase, but I think in general, the constraint of hardware being the expensive component is no, no longer the case.

It's really more how do you architect systems properly and how do you make developers able to be efficient and productive so that you can get, you know, ship software and high quality and build those compelling experiences. You know, I think a lot of what you're asking goes back to kind of the founding of MongoDB.

You know, our founders, they were developers and CTO and developers at. Double click, you know, at the time, one of the largest high performance piece of software in the world was running that dual sided ad network eventually got acquired by Google and powers the majority of people's revenue, I think through till this day.

And they really face two types of problems, both of which I would characterize as fundamentally about scale one, you know, as their application began, their platform became larger and larger, serving more ads at a faster and faster performance requirement. It became quite expensive to throw hardware at a traditional relational kind of monolithic architecture with, some of this typical technologies that so that was a problem of cost scaling and performance scaling.

But then there was the more subtle point of scale, which is really around productivity, which is, you know, we have a small development team. You can build on any stat, kind of leverage any database and still be relatively quick. But what they observed is as they went from dozens of developers to hundreds of developers to thousands of developers, they didn't get a kind of similar return of productivity back from those teams they felt like their teams are getting slower as they scale.

And one of the reasons then, certainly not the only, but one of the reasons that was significant for that was just the rigidity of the traditional database model at the time. You know, heavy weight relational schema, the change management to take those down, like every change was, you know, a big program in its own right.

And that was one of the factors that led them to feel like, okay, developer productivity was going to be really important. And so when they released MongoDB, they, you know, they solve for the scale and cost by leveraging distributed systems architecture. MongoDB is, you know, a distributed database. You get things like high availability, failover, the ability to scale, endlessly, effectively, horizontally.

which makes things much more cost efficient. But then they chose a different data model rather than relational to build the database itself. And that's what we call the document model, which in their eyes and certainly what we've seen in the 15 years since the founding of the company is a much more natural way, especially for developers building an object oriented programming to think about reasoning and therefore makes them more efficient and more scalable as they build more and more capability over time.

Richie Cotton: That reminds me of the sort of classic business sense about the mythical man month where you try and scale the number of people and then your productivity doesn't scale linearly with that as well. So it's interesting how you've got to think about architecting your processes and your talent as well as just the technology.

Sahir Azam: Absolutely. And, you know, I think that the idea of a database being built from the developer in, modern developers mindset and needs first was a new thing. And that certainly led to the extreme popularity of MongoDB, once it was open source, but not only MongoDB, you mentioned there are other kind of NoSQL or non relational, all with different flavors and caveats.

But that, I think, showed that there was demand in the developer ecosystem for a different way of working with operational data.

Richie Cotton: You mentioned this idea of the document model. So a document database rather than a relational database. What's the difference in Why would you want a document model?

Sahir Azam: Sure, yeah. In a relational database You know, if you're modeling, for example, a customer or a product, it's typical that that is represented in not a single row and table, but in a handful of rows and tables that you have to manage the relationships between and have a rigid kind of schema around, which makes it harder to make changes quickly over time.

And it creates a cognitive burden because when an application developer is reasoning about the business objects in their code, whether that be a customer, a product, whatever it might be, They don't want to shred it into a bunch of rows and tables and then have to reconstruct it every time they want to access that data pattern.

The more elegant way is to persist the objects that are retrieved and persisted by the application all at once. And because the hardware is cheaper, it's more cost effective to store all that information together. And we can start a database company in the 2000s and 2010s versus, 40 years ago, that constraint was removed.

So that was kind of the linkage between the point I was making up earlier. And. That makes it just a lot faster for developers to cognitively reason about what they're building because the database just feels like it's integrated into their core development workflow and the business objects that they're coding against, as opposed to having to think about the business objects in the application code and then remember how that's modeled in a schema, right?

A different language, aka SQL to interact with that data and then have to meld those two things together.

Richie Cotton: Okay, that certainly seems to make sense. So if you're writing object oriented code, as I guess most application developers do, you want something in the database to look very similar to that object oriented code rather than have a

Sahir Azam: the access patterns and just the business objects that you're managing in code and And certainly, you know, that fundamental kind of insight and building a database with that mindset is what really distinguishes MongoDB and the richness of handling all those objects with all the sophistication we have is what kind of distinguishes MongoDB versus know, SQL or non relational players, but also obviously the corpus of relational databases, whether they be newer offerings or the more traditional ones that we

Richie Cotton: Okay. so having this document model seems to be one way in which you're helping developers to be more productive. Are there any other ways that would think about making developers more productive and having a database for developers?

Sahir Azam: Yeah, absolutely. I think there's two macro points. I'd add beyond kind of a core data model itself. One that's closely linked to it is, you know, if you're building an application in 2024, it has all sorts of different requirements. It's very rare that you're going to only have a SQL relational database powering that application.

we typically see that you also need a separate search database, know, if you're doing an IOT system, you need a separate time series database to be performant for that type of information. Or these days with AI driven applications, you see a vector database get introduced certain use cases.

So what we find is that combined with the simplicity that the cloud providers have made it to just easy for any developer to spin up a new database has actually led to a lot of complex sprawl in the average applications database architecture because it's all these different things, creating duplication and serving kind of these narrow needs as opposed to in the past was largely one database could serve your entire applications needs.

And so We spent a lot of time at MongoDB saying, okay, we have this really rich data model, the document model, but we're still seeing all these other pockets of things having to be bolted around a traditional relational database. How can we simplify that? So it takes its form in two ways. One, how do we make sure our database can be an effective and superior alternative to relational databases for the use cases, systems of record in particular that relational databases suited for?

So we're not just another bolt on, we're actually a fundamental. replacement for it. So that led to years of R& D adding, schema governance, enterprise security, transaction guarantees, all the things that people associate typically with the relational camp, but not with the NoSQL camp. And we brought that forward into MongoDB so that we could serve the breadth of use cases that over the decades that relational databases have been known to be kind And then we said, okay, what are the other types of areas where we, we see most commonly developers have to bolt on some more niche solution?

And so we, you know, we sell search as a very critical one. So four and a half, five years ago, we integrated search into the document model into the database. So it feels like it's just native and not a separate system. And we manage all that synchronization. So you don't have to worry about it operationally.

for ai we added vector capabilities natively into the document model in a really elegant way Time series data came from our customers saying hey, I don't want to have to stand up a separate time series database I use mago db for my core transactional data Can you make it much more performant for time series?

So I don't need to bring in another tool And so we've simplified that We call this idea of like a developer data platform, but it's really just about looking from the modern applications needs backwards and saying, how do we deliver that in a way that's elegant and seamless for developer? So they spend less their time bolting together four or five different technologies and then obviously the operational cost and burden of managing that over time.

Richie Cotton: that's such a common problem we talk about this a lot on the show is the idea of data silos, where your data is just stuck in different places. And certainly if you've got like four or five different databases, then you've by definition got data silos. I'm curious though so a lot of these databases, like time series databases, tend to be quite specialized.

How do you manage performance then if you're trying to do everything in the same database?

Sahir Azam: Yeah, and I want to be clear. I don't think the days of an organization having one database as a standard for everything are ever going to see himself. There's a lot of preference that developers express and choosing the right technology for the job. And, our goal is not to be the one database to rule them all or anything, but it's to be general purpose enough that for everyone.

70 80 percent of the common operational use cases, and especially large organizations that they shouldn't have to go reach for a highly specialized solution. Now there will be certain edge cases or use cases that are so deep in time series or so deep on graph traversals and graph capabilities that a general purpose platform like MongoDB maybe not the right choice.

But we think that the average of what they get used for is actually not that complicated. And so that just comes down to us, optimizing the core database engine, the storage, the indexing for these different use cases. And we pick the ones that we think are best suited for the document model that we think, are pervasive enough that it warrants investment from us to go solve that problem.

And that's how we come to those decisions. But it's means, it's a effort in performance tuning, optimization, indexing. To make sure that we can capture at least, 50, 60, 70 plus percent of the workloads that ever need those features, but in a performant way in a much more simple integrated fashion than having to always reach for the specialist solutions.

Richie Cotton: Okay, so, just cover the main use cases all at once. that seems sensible. So I'm curious as to whether changes in database platforms or data platforms more generally have changed how both software development is done and how data teams work as well. What's the impact on these other teams that are working?

Sahir Azam: I mean, one big shift over the last decade certainly is just the The level of comfort that developers and more and especially large net enterprises even now have with, cloud data platforms, even a decade ago, sure, fast moving, more or less risk averse organizations were like, Okay, I don't have the time to manage my database infrastructure.

I'm just going to go use a service from AWS or whomever else. Now it's almost like the default in many organizations. Yes, there are workloads on premises, some of which will always stay there. But when a workload is migrated or built in the cloud, I think there's just the entire industry is almost lean to the fact that, okay, managed services, cloud services are the way to consume this because a much more effective experience and your dollars are better spent building applications than managing data infrastructure.

So I think that big shift, started initially by the hyperscalers, but then. look at MongoDB or Snowflake or Databricks and the other parts of the database market. there are now a handful of really mission critical scale cloud data platforms that people trust. And I don't think that was the case a decade ago.

So I think that step change is one. And then this idea of kind of simplification of, you know, not needing to have five niche databases that you have to bolt together and connect the dots and do data duplication, but just simplify. Frankly, it makes it easier for organizations to adopt, like a lot of our larger customers, the large enterprises are saying, you know, I want two or three standard database offerings.

I don't want 25, you know, niche technologies. And so how do we then govern that? How do we get skills reinforcement? a lot of times it's not whether the tech can do something. It's can they train all of their developers On a set of skills that then are repeatable and reusable across many jobs to be done, so to speak, in the organization, these are the real concerns.

Customers face day in day out in the real world. And the idea of having, a vendor they can trust that can solve a lot of problems and can develop this sort of skills. Inertia in their organization is definitely a benefit of kind of the simplification of these things into a unified and elegant way of bills.

Richie Cotton: Okay, I definitely like that idea of the simple tech stack, so makes it easier on procurement, makes it easier on governance, makes it easier on upskilling as well. Okay.

Sahir Azam: Zachary.

Richie Cotton: so I don't think we would get away without talking about generative AI. Since it's working its way into everything, I'm sure it must be changing databases in some way.

Can you talk me through what those changes are?

Sahir Azam: At a macro level, the thing that's exciting about generative AI is I think anything there's going to be more sophisticated software being created in the world, right? Because one of the most powerful use cases for generative AI is code assistance or, now you're talking about agents that can do more sophisticated software development tasks.

And, you know, I think that's only going to increase as the tools get better, as the models, jump with any generation. I think the idea of creating software is going to become easier and easier, whether that's a 10x developer now being 100x developer, or for a more simple application, you have somebody who's less technical being able to generate, software themselves, both of those trends are happening already in there.

And I still, even though I argue it's still early days, it's a very clear line to see how that just gets better and better. And so, more applications means there's going to be more data that's necessary to serve those applications. And so we absolutely are excited by this into the next kind of level.

change in the software industry around application development. That's kind of the macro framing. I think in terms of data infrastructure itself, or data itself, one of the things that's interesting is the majority of the information that we interact with in our mobile phone applications, or, you know, web applications that we use in our personal lives or business lives, is it's largely structured or semi structured data that's powering those applications.

And yet, 70 percent plus of the world's information, I think I read one of the analysts, is truly unstructured information, audio, free form text, video content, and the like. And that type of information, besides for maybe the most basic uses, operationally been powering Applications in the way that generative AI and these models will allow us to, because now you can start to use GenAA models to run similarity search, infer meaning from all of this unstructured data and make it usable for real time applications in a way that just wasn't really possible.

That unlocks a whole corpus of value and knowledge human information that the software can now leverage and build on top of that really wasn't possible except for displaying images or streaming video, like those types of use cases. I think that's a macro level, very powerful concept. I was looking at a presentation from actually, I think, one of our investors, and, I still think it's early days, just like the first versions of the iPhone came out and the apps were pretty simplistic, you know, the flashlight app and things of that nature, the idea of what's possible in applications with generative AI, I still think is in its infancy.

As I think models get faster and more accurate as the cost drives down, we're going to see new business models and experiences be created. That we just can't even conceptualize today, just like we couldn't believe that if we couldn't foresee necessarily an Uber or an Airbnb or these other business models that came up in the mobile era, I think more powerful generative AI will create new types of software and application interfaces that we haven't even thought of yet.

so I think it's early days, but I'm really excited because I think it's that kind of. primitive that any type of application in some way shape or form can benefit from over the long term And I think it'll create experiences that we just haven't seen before Yeah,

Richie Cotton: Yeah, definitely exciting times. And so, it seems like it's a mix there. So some bits about helping developers be more productive. So you mentioned the idea of AI code assistance and some bits about things that could impact the end users directly. So, for example, you mentioned like, well, I guess all those new applications that are going to be created you mentioned the idea of that.

All these unstructured data types, so text and images and audio and things like that. These are now kind of data and you can do cool stuff with them using AI. Can you make it a bit more concrete and just talk me through like some of these use cases?

Sahir Azam: ecosystem Many of whom build on mongo db that are doing awesome stuff building Net new businesses with generative AI in the heart of the application. We're actually seeing a lot of experimentation and now even production stuff happening in traditional enterprises as well. Two of the ones that stuck out one, we work with a very large European automaker and one of the problems they went after with audio models in particular was car diagnosis.

So what they did is they created vector representations of the common sounds that their models of cars make when they have certain issues. And you know, you, we all, anyone who's driven a car knows you sometimes you just know something's wrong. You hear a certain rattle of various sorts. Well, they, they were able to basically catalog those and turn those into be an AI model, a generative AI model into vector representations.

So now, when a car shows up in one of their shops, they can record the issue that that particular car is having and use it to do diagnosis really fast against a known set of issues. And that's a similarity search using a, an AI model that basically generated off these audio files. So that actually shrinks down the amount of hours and time spent for diagnosing a particular problem.

Massively, especially when you extrapolate it across the thousands of different dealer or third party sites that they have globally. If you can shrink something that takes a skilled technician hours to be now something that can be done by a less skilled technician and can be done in minutes just by doing this similarity search mat, that's millions and millions of hours of savings to that organization.

That's the diagnosis part. The next part is then, okay, how do you actually fix the problem? Well, right now, most of the steps for these issues, you find your diagnosis code and then you have a manual that you go through around the steps and parts required to fix it. Well, they also just layered in a chat bot on top of all their repair manuals.

So now the technician can just say, okay, this audio diagnosed this problem. What are the three steps? And it just gives a nice summary, but I have to troll through the PDFs of the physical manuals to get that answer to kind of closing that loop. Like a very pragmatic applicable use case. I think that we can all sort of, you know, intuitively understand another as we work with Actually, I was just on a panel with the team that built this at a novo nordisk, the pharma company They have like any pharmaceutical company a pretty heavy paper process.

They have to follow to submit new drugs for approval. It's called clinical study report and that typically takes You A lot of people, a lot of time to write and review manually. They ended up training a model on all their submissions, both draft submissions and the ones they submitted to the various regulatory bodies.

And now they have a model that auto generates based on the raw input data from the clinical studies. The first draft of that CSR, which used to take weeks, and now it takes 15 minutes to generate a a reasonable quality first draft. Now, it's not like they're submitting that directly. They still need to review that and, pass around obviously the stakes are high for these types of things.

They obviously want to, they still have a lot of manual review in the process, but it shaved off weeks of work to minutes to get a much higher quality initial draft and having that done manually and training people to do that and get them up to speed on a particular domain. So these are two kind of relatively recent, use cases where we were very lucky to be part of, ideation and build out and proving out these proofs concepts, but I'm sure we're going to see even more and more over the coming year or two.

Richie Cotton: Some very cool examples there. I'd say the the first one about the car making noises. I drive a 20 year old car, so it has a lot of squeaks and funny noises. So, uh, yeah, that sounded incredibly useful, but also the other uh, business use Yeah. Like just from simple, like working with documents to.

More sophisticated examples. it feels like there are lots of opportunities just to make use of this new technology.

Sahir Azam: what's interesting about Gen AI is, know, it's non deterministic and it's like it's named. It helps you generate knowledge and information in a way that just wasn't really possible before with classic software. And so the types of applications, I think, will really go after services industries that were very human intensive and software really wasn't ever a great fit for.

so I think it's less about. Disrupting existing software, which I think some of that will happen and more about like bringing software to industries that just was never really a great fit for before.

Richie Cotton: So I think a lot of businesses now thinking, well, how do we make use of all these other data types and file types that we've got lying around? Does that require a change to your fundamental data infrastructure to take advantage of this? What do you need to do?

Sahir Azam: Yeah, there's a lot of project work happening in the industry and certainly the consulting world is benefiting from some of this right now in terms of getting your data house in order. So how do you properly catalog, tag, label data so it can be organized and leveraged? So I think that's one area of work I think most organizations, especially large ones, the data is really siloed.

As you mentioned earlier, it's hard to organize or understand. So there's a whole lot of work. I think organizations are undergoing to try to get their hands around this because they recognize it's going to be valuable if they're going to benefit from AI in the future. The other is just you know, a lot of the systems of record that house this data.

Critical information for a, business is locked up in 20, 30 year old legacy systems. you know, in the past it was sort of like, okay, we can deal with just the maintenance of this. Cause it's just too expensive to actually move off these old database systems and all that. But now like, I think it's pretty.

No one's going to add generative AI capabilities to a 30 year old application on a 30 year old database, and if that data is the information that's going to differentiate your business, by, powering AI models, integrating into foundational models, then you, there's like a pull now to get it into a much more modern posture architecturally for the application, or obviously from a database perspective to a more modern technology standard, obviously MongoDB is a beneficiary of that, but we're also, you know, Leveraging AI actually as a tool to make the process of migrating these old applications and modernizing them to something new, much less risky, more cost effective than ever before.

It's kind of a circular thing here. It's interesting.

Richie Cotton: Yeah, I can certainly imagine how if you got 30 year old data on some legacy system, then it's gonna be very difficult to hook that up to some sort of a generative AI application. Interestingly this is one of the issues where data frame guests seem to be completely divided. So, some people are very much like, you must store all your data for all time, whatever it is, everything needs to be archived, and so we're like, well, just keep the data that you care about.

So, is that 30 year old data really going to make a difference to your generative AI applications? Do you, when

Sahir Azam: not even be the age of the data. It may just be like, You know, when I'm referencing this, it's not necessarily that the data itself is 30 years old. Now, maybe there's some value in that data, like models, what are they're great at is finding relationships between information that humans wouldn't necessarily be able to understand, whether it's classical machine learning or, you know, even training used in AI models.

So, you suspect there's some value in some of those corpuses of data that we don't even understand yet that the models will be able to infer, but that's not even what I'm referring to. A lot of these applications are serving the business and customers today. They just happen to be 30 year old software systems.

So now, how is this 30 year old software system, this old application, this old database, Suddenly going to be something that's flexible and performant enough and, scalable enough to handle a generative AI, application built on that system of record that all set of data, even if the information is only one week old, you know, generated by the Apple last week.

And I think that's what we're seeing. It's more commonly the driver. It's just an old stack. People recognize that you're not going to use old technology to solve a new problem, which is generative AI applications. Yeah. And so they need to get all that information in these applications, modernize to these stack, and then go from there on applying and adding AI capabilities.

Richie Cotton: makes sense. And I was trying to think of some examples of 30 year old software, just for comparison. Oh uh, Windows 95 is going to be turning 30 next year. So it's that era.

Sahir Azam: there. Yeah, that's desktop software, but think about like server side in a bank, had to build a custom, fraud detection system 15 years ago or insurance. You have a quoting system. We're working on modernizing it. I think a 20 year old quotes quoting system for insurance quotes at a large insurer right now, you know, and, you know, It was built by developers who are no longer at the organization.

Nobody understands the code base. There isn't testing necessarily, but now suddenly this system houses critical data that they know will be powerful with the gen AI. They use it's fine tuned or, you know, augmented foundational model. They're not going to do that with their 25 year old stack So now they're modernizing that whole application, moving its data forward so that they can get it into a posture where they can move forward. Forward for the next decade off the backs of that.

Richie Cotton: Okay. Yeah, I can certainly see how once you get into software that's embedded in machinery or software like in some mainframe, some large organization, it, just has to last a lot longer. I guess, this is probably a trillion dollar question, but how do you go about making sure that you don't end up with that obsolete technology stack that is 30 years old, and then you're stuck with stuff that doesn't work?

Sahir Azam: Yeah, I think every organization would want if they could flip their fingers, if it was no cost and no risk, they would, of course, modernize things, but the reality is that for many applications, it's just too expensive and it's not worth other investments they could be making. And that's how these applications age.

And, and the, there's a little bit of, if it ain't broke, don't fix it kind of a thing sometimes as well. But now, because of generative AI, because of the increasing push to, move to the cloud, in some cases, the cost pressure or regulatory pressure now to get off of these old technologies.

are enough reasons to, force the issue. And ironically, Gen AI is a tool to make those modernizations happen at lower costs and lower risk in its own right. And so it's kind of like a, a moment where I think organizations are much more willing and, or forced. To modernize systems that probably would have left alone otherwise.

And so I think AI is a key driver of that that we're seeing in our customer base. And then for us as a technology provider ourselves, you know, I think it's really around making sure we continue to keep an innovative culture. We were really quick to market around adding AI capabilities to our platform, to our developer tools, our vector capabilities, et cetera, but also a whole lot of integrations.

Now there's an emerging whole ecosystem of, technologies that people are using, the model providers themselves, the inference platforms. a new set of developer libraries, eval tools that, we've had to integrate with to make sure that MongoDB is worked well for the most moderate gen AI development as it did for, cloud native or mobile native web development, which is kind of where we got our roots.

And I think that's about execution. if we were to not able to move fast. Not stay close to those emerging ecosystems and integrate well and really learn how customers are using those and and supporting along the journey that I'm sure I think any company that can't cross that chasm or make that leap, whatever analogy you want to use, you know, has the potential of being stuck in a data position.

Richie Cotton: I do like that it works in both directions, so generative AI is increasing the demand for more modern data stacks, is giving you the tools in order to transition to that modern data stack as well. All right. Um, I'd also like to talk a bit about roles and so how is this sort of modern database technology, the modern data developer platform, how is that changing both data roles and software development roles?

Sahir Azam: Yeah, absolutely. I think in general, over the last 20 years or so, database decisions have gotten more democratized, meaning it used to be a CIO signs a large contract with, one of your favorite large enterprise software companies. You can, I'm sure you can name and they would kind of mandate that this would have to be the standard for You know, many applications, if not all applications, except by exception, that would be kind of the traditional 90s way of, buying software early 2000s.

But I think because developers are so advanced. Valued in terms of, there's more demand for software than there is supply, so to speak. And we'll see how JNI changes that, but it's made the developer preference on tools that they love working with that makes them productive and easy to experiment really be Yeah, much more powerful.

In fact, MongoDB and many organizations starts because some, developers downloaded our open source version or signed up for a cloud service because they like MongoDB versus a more traditional option that's available. They have success with it. It starts to grow, you know, organically in those organizations.

And eventually we get a more strategic sort of platform. relationship with those customers. We have executive level buying and the bottoms up developer adoption, but that's a big shift just generally in terms of the preference of technology stacks, at least for operational databases, moving more to really being a user preference as opposed to a buyer preference.

And so I think. that's really powerful. We've been the beneficiary of it. We love that we've constructed a whole lot of work and community and Debra and just even our strategy with open source to try to drive that and be really strong on that bottoms up adoption. I think the other thing that's happening, though, more related AI and machine learning generally is For a long time, in many organizations, data science and machine learning was a small team in a centralized fashion that was kind of like a service center for when you needed a model to solve a problem for some business user, or maybe there's some software use case, they build like the machine learning model, and then it gets thrown over the wall and maybe integrated or implemented by a core software development team.

One of the things that I think Gen AI is changing is it's not like there's typically a centralized Gen AI development group. there may be some, standards teams or, you know, a team that has best practices, but we're seeing this really shift left to be standard part of almost every software development team.

you know, even today that developers we see coming out of college, university, they have AI and ML skills that they're learning as part of their core CS curriculum. So I think this idea that it's moving from sort of a highly specialized, centralized skill set and role organization is something that just.

It's going to be more brass tacks where you're going to have either a level of basic AI skills in every development team or every development team will have an AI developer or an ML developer embedded as part of that core group across every part of the enterprise is definitely a trend that we see happening.

Richie Cotton: That's kind of cool that you have these sort of bottom up approaches where people just adopt a tool because they like it and then that grows rather than just being enforced on a whole organization. We've mentioned data teams, we've mentioned developer teams quite a lot. I'm wondering whether these are coming together.

Certainly when you've got applications involving data, it seems like the two roles are overlapping a bit more than they used to.

Sahir Azam: Yeah, and whether that overlap means that there are organizational ways for those data teams to work more closely day to day with their development counterparts, that's certainly happening in some organizations. Or the idea of leveraging data to power new modern application experiences with AI is just a core part of the fundamental developer skill set of the future.

think that over time we'll see more of the latter, but I think, organizationally those things are being pulled together more so. Every day, because if you need to doing, you prep your data to, fine tune an AI model or to, create a rag workflow to integrate into a public foundational model, you're going to need the data team, the governance team to be able to be part of that, but then you need developers to obviously code that up, integrated in product managers, designers to think about the actual experience of that to the end user of the application.

And so that forces that integration or collaboration.

Richie Cotton: Okay, yeah, so lots of teams mentioned there to seem like creating applications that involve data or AI. It's a, it really is a team sport with a lot of the organization involved.

Sahir Azam: Yeah, I'm a skeptic of this idea that all you're going to need is models and suddenly, you know, you don't need great product craft is to use the term loosely meaning understanding the core pain and needs of your of your end users designing delightful software that's great. You know, whether that's an audio interface or whether it's a, you know, a classic visual interface that we think of today, there's a lot of craft that goes into that.

And then it's Gen AI is in these models are a tool to solve those problems, but I don't think that goes away. The ratios might be different. This tool is very powerful, so it may change user experiences in fundamental ways. But I think people underestimate how much true product craft is still there, and that requires different skill sets.

And even the most successful, Flashy gen AI startups that I, you know, we all can read about, or many of whom we spend time with at MongoDB, the amount of craft that goes into creating a great product, even if it's based on gen AI fundamentals, it's very hot. That's what still drives their success.

Richie Cotton: Absolutely. Yeah. It does seem like there's quite a big difference between something that's just okay and thrown together and something that's been very carefully designed.

Sahir Azam: If I see another raw chatbot, I mean, come on, how much, like, I'm not saying there's zero value in that, but like, it's not enough, you need a real crafted experience and you need to learn. How to tie models together to get the best outcome for the use case you're trying to build. And there's a lot of complexity in that problem.

Richie Cotton: Do you think that the changes in technology, so, basically more modern databases, generative AI, things like that, do you think they're changing the skills that data people need?

Sahir Azam: it's changing the skills that the average developer needs. I'm not certain it's changing the core needs of the data teams because I still think there's a lot of data engineering that still needs to happen to prep all this information and get it organized in a way that it's useful for these workflows with Gen AI.

I think, Anything Python is getting more, important, not less. And that's always been the core, the lingua franca for advanced kind of data teams. In a lot of ways, there's a whole plethora of new tools and technologies that are emerging. Always. I was just talking to a startup this morning that serves data engineers and ML engineers, for example.

So I'm not saying the technology isn't going to change, but I feel like there's more skillset in core development that needs to develop this versus the other side of it.

Richie Cotton: Ah, that's interesting that you think it's affecting developers more than data teams.

Sahir Azam: I think develop, I think the average application developer will need to become more sophisticated with leveraging unstructured data, AI models, machine learning generally. yes, they will lean on these centralized data teams obviously to help, but I think they're going to be, for organizations to move really fast, they're going to need to develop that skill more pervasively in the organization.

Richie Cotton: Okay, so, developers need machine learning skills, some AI skills, and then, I guess, on the data side, it's like, well, yeah data engineering seems to be becoming increasingly

Sahir Azam: more under pressure to serve all those needs and be that specialization of, or the experts in managing all of that data and making it useful.

Richie Cotton: So, have you seen any success stories where organizations have just leaned in on these new technologies, they've built something cool, and they've had a success?

Sahir Azam: Yeah, I mean, there are dozens of examples. I gave you a couple, large company examples, but every week, we pretty maniacally look at all the new. user signups of our AI products. And, there are, frankly, startups in every industry vertical in almost every geography of the world, whether it's Southeast Asia, Africa, obviously the Bay Area, London, all parts of Europe.

So I think, there's a lot of innovation happening now. How many? You know, we're in one of those hype cycle kind of moments where there's a whole new wave of a platform technology. I'm sure many of those ideas will fail. But like all things, From those will blossom, you know, amazing new companies that I think if we're sitting here five or 10 years from now will be the next, tech big brands that we think of, we saw that with the shift to cloud and mobile, I think this is going to be a similar, if not bigger shift in terms of new types of organizations created.

I think we're just lucky at MongoDB to get kind of a view of that across the board. Thanks. A global scale because we see, 50, 000 developers every week, trying our products, playing with things. And we do our best to make sure that we're trying to understand the innovation that's happening, not just in the large organizations, but globally with, new ecosystems that are spreading out.

Richie Cotton: I love that it really is global and it's not just these big organizations that are getting in, right? It's everyone. All right. To wrap up then, what are you most excited about in the world of data and AI?

Sahir Azam: you know, certainly there's a lot of concern around how AI will, change various job roles in our industry, or broadly in our industry, in the economy, world economy. And I certainly think that that, presents the risk of being disruptive and, really, you know. we'll change things fundamentally, but I also believe at the same time that at an aggregate level, we're gonna, be more productive overall as a species, so to speak, and it will ultimately lift people because this productivity and intelligence gets smarter and more capable.

And I think humans overall always migrate to the higher order problem that we can uniquely solve. So I'm more in the optimist camp and the pessimist camp, but. With the eyes wide open that, there will be some dislocation and disruption along the way. But I think, you know, it's early days of an exciting time.

I think for, this is a podcast aimed at technologists. I think it's an exciting time for all of us to kind of learn something new, lean in, no matter what level of experience to really kind of lean into all this change. And leverage that as a way to drive, you know, personal growth as well.

Richie Cotton: Absolutely. Lots of very cool things coming in the pipeline over the next few years I hope. So yeah. exciting times. I agree.

Sahir Azam: If I could predict exactly what the next new fancy thing was, then, you know, I'd probably be out there investing in it somewhere, but I think there's more unknowns than knowns, but I uh, but I do know the unknowns are probably gonna be really amazing in the long term.

Richie Cotton: Nice. All right. Thank you so much for your time, Sahir.

Sahir Azam: Thank you, Richie. I really appreciate it.

Topics

MongoDB

NoSQL

podcast

Not Only Vector Databases: Putting Databases at the Heart of AI, with Andi Gutmans, VP and GM of Databases at Google

Richie and Andi explore databases and their relationship with AI, key features needed in databases for AI, GCP, AlloyDB, federated queries in Google Cloud, vector and graph databases, practical use cases of AI in databases and much more.

podcast

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.

podcast

[AI and the Modern Data Stack] How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence and lakehouse technology, how AI tools are changing data democratization, the challenges of data governance and management and how Databricks can help, the changing jobs in data and AI, and much more.

podcast

The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google Cloud

Richie and Gerrit explore AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, and much more.

podcast

No-Code LLMs In Practice with Birago Jones & Karthik Dinakar, CEO & CTO at Pienso

Richie, Birago and Karthik explore why no-code AI apps are becoming more prominent, uses-cases of no-code AI apps, the benefits of small tailored models, how no-code can impact workflows, AI interfaces and the rise of the chat interface, and much more.

Tutorial

A Comprehensive NoSQL Tutorial Using MongoDB

Learn about NoSQL databases - why NoSQL, how they differ from relational databases, the different types, and design your own NoSQL database using MongoDB.

Arunn Thevapalan

See More See More