How Data and AI are Changing Data Management with Jamie Lerner, CEO, President, and Chairman at Quantum

Richie and Jamie explore AI in the movie industry, AI in sports, business and scientific research, AI ethics, infrastructure and data management, challenges of working with AI in video, excitement vs fear in AI and much more.

Oct 7, 2024

Guest

Jamie Lerner

Jamie Lerner is the President and CEO of Quantum, a company specializing in data storage, management, and protection. Since taking the helm in 2018, Lerner has steered Quantum towards innovative solutions for video and unstructured data. His leadership has been marked by strategic acquisitions and product launches that have significantly enhanced the company's market position. Before joining Quantum, Jamie worked at Cisco, Seagate, CITTIO, XUMA, and Platinum Technology. At Quantum, Lerner has been instrumental in shifting the company's focus towards data storage, management, and protection for video and unstructured data, driving innovation and strategic acquisitions to enhance its market position.

Host

Richie Cotton

Key Quotes

Us [Quantum] and IBM have been running the world's biggest archives for the last 30-40 years for most of the world. We've got a lot of experience in helping people build these archival systems. It's probably becoming the hottest part of our business with AI. Everyone thinks AI is all about buying GPUs. Well, part of it is, but a big part of itis, 'my God, I better keep everything'. A lot of companies had a data throw out policy and now they're all moving to a keep everything policy and they have to totally change their architecture to do that.

Keep every single bit and byte of your data. If it's not this year, if it's not next year, if it's not next, it could be next decade, could be next century, but at some point someone will figure out how to learn from that data and use that data. And the only data you can't learn from is the data you deleted. Archive strategies, data preservation strategies are critical. Every company needs one.

Key Takeaways

Implement a robust archiving strategy to store vast amounts of historical data, enabling future analysis that could drive breakthroughs or competitive advantage.

Organize your data using comprehensive metadata to unlock its full potential, making it easier to run advanced AI models and retrieve specific insights quickly and accurately.

Store high-priority data in fast, expensive storage for immediate use, while moving older, less critical data to lower-cost, cold storage for long-term archiving.

Links From The Show

Quantum

Career Track: Data Engineer in Python

Transcript

Richie Cotton: Hi Jamie. Glad have you on the show.

Jamie Lerner: Hey Richie, thanks for having me.

Richie Cotton: Brilliant. So, to begin with, just at a high level, can you talk me through the different ways that AI is used in video?

Jamie Lerner: The short answer is it's being adopted many ways, in many use cases, you know, in the movie industry and television and sports, it's being used for like people to see how much logo time they got during a sports game, like their logos in the background facial recognition to figure out what players are on a field.

Colorization and up resing, simple things like subtitling and closed captioning for hearing impaired, which all used to be manual tasks. A lot of that's being automated. I mean, in video surveillance, calculating wait times in a line. Fight detection, argument detection, a mood, just people are unhappy in your store.

the cases are amazing and these video use cases are really going into like microbiology and the development of drugs. Because a lot of drug development are now discovery is done by videoing proteins and cells and how they interact. So like microscopic videos. So it's just stunning. The proliferation of AI against video and with video, I would also expand that to like photographs and sound, because the same analysis is being done to things like x rays, MRI images, cat scans that are.

They're not moving images, but, you know, like a CAT scan can have a hundred thousand images with it.

ltr">Richie Cotton: Okay, so it seems like there's just like tons of different use cases. And that's very exciting. Although some of them seem like, these things have been around for a while. So you mentioned things like adding subtitles and we've had subtitles for a long time. Can you talk me through, like, how AI

typing is, yeah, okay, so this is just automation then, in that case.

Jamie Lerner: yeah, I mean, you know, let's just drill into that because it's kind of fascinating, right? a lot of times when subtitles are happening on live television, you can tell someone's typing it in. There's all sorts of typos and whatnot. And when it's done with an AI engine, unfortunately, it's often way more accurate, way less typos.

It's not a hundred percent accurate. But it does happen very quickly very inexpensively. Just to give you a sense, an hour long news segment, you could probably generate all the closed captioning for that in 7 to 9 seconds. And be 95 percent accurate. But then you can go and translate it to multiple languages.

But now there's new AI algorithms that will actually start having you speak Mandarin in your own voice and own accent and repixelate your mouth. So you're not speaking English, English, but your mouth is moving the way it would if you had been speaking Mandarin. it's going far beyond just pump out a subtitle to actually like repixelate the mouth.

and create your voice speaking Mandarin, which is pretty far out,

Richie Cotton: Yeah, that's pretty amazing because I'm just sort of, translating the text in the subtitle, that's one thing, but actually changing the video itself uh, is something else altogether. And I suppose this leads naturally to talking about movies, because I mean, special effects have been around for a long time, but how is AI changing the way movies are done?

Jamie Lerner: Well, I mean, I think you're putting your finger on a super hot topic and where there's a lot of strong opinions. You know, there's a lot of people, and this is specific really to the movie industry. I think it's a little different in television and sports, but movies are considered works of art made by artists by hand.

And I think there's a lot of people that feel that using machine automation in an artistic process cheapens it, it belittles the artist. I mean, there's big debate about how much automation can you use against a piece of art. And I think that's probably true for the highest end art movies. I think what you're seeing as you knock down, all the way down to like, reality TV, the news, game shows, I do think you see automation, and today I don't see it replacing artists, or even replacing translators and people who work on subtitles.

I just see it making their job easier in that it's automating pieces of the work. But the subtitles still get reviewed. Is this really the right translation? Is this really what we meant? I see it automating tasks. It'll maybe make an attempt to colorize something, but a colorist will still look at it and say, is that exactly what we are trying to achieve?

So I, I think, In a lot of the more routine making of television and sports production, I think that automation isn't really replacing people. It's just making them more efficient. But as you move up into things that are more considered to be art I do think there's a very, very active and pretty tense debate about what role AI has in the creation of, movies that are considered to be works of art.

Richie Cotton: Yeah, certainly we had both writers and actors strikes in Hollywood last year, and that's like, Something that did raise a lot of issues, whereas in sports, no one's complaining that you've got some sort of AI umpire helping out.

Jamie Lerner: Yeah, I think that, there's so much interesting AI in sports, just determining like, hey, I want to see these highlights. And then you can just create your own highlights. But if, if the, the video is tagged really well, where you know every penalty, every shot, every player did anything, you can say, I want to see this player's highlights.

Or I want to see this part of the game. You can almost generate your own highlights if it's tagged Well now to have humans do that. I mean i've seen humans doing it at Nfl does it by hand? I mean used to lot of the sports leagues have loggers And the fidelity of logging even if you have 10 or 20 loggers in a game is still very light Compared to what an ai can do or it can look at every touch of a ball how long you touched it, what did you do with it, and it can get very granular.

But I'm seeing, you know, again looking at sponsors to say, what is every visible logo by every sponsor in a stadium, and how much airtime did they get? I mean, people are really interested in that. What celebrities were in the stadium, and can we get cameras on them? Is that acceptable, you know? looking at what did someone do and what would an A.

I. said was optimal to do? Like, did the race car driver drive in the optimal line? Should they have braked later? what is physically possible against what the person actually did? I mean, there's just so many cool applications. That people are coming up with every day and it's a lot of fun.

I think the other thing that people need to get their arms around is none of these things are entirely accurate. they're good starts. But I think when we talk about, Hey, a machine can generate subtitles, you have to understand like, Okay. I'm done. 98 percent of the time, but there'll be mistakes, right?

There'll be facial recognition that isn't right. There'll be a calculation that isn't right. And, you know, a lot of people tend to focus on that two to 5 percent that isn't right. And not the 90 percent that is right, but. The 2 5 percent that isn't right, that's actually, there's a big role for human beings to go and double check that, understand that.

it isn't you know, this idea that human beings are kind of being rolled off. I'm not seeing that. I'm just seeing human beings do a lot more. sort of like when we created word processors, I have a friend in the who's a TV writer and producer, and he says it's like when we created the word processor, it's not like we got rid of writers.

It's like, Oh my God, like, right. Like we just made writers more efficient and it's, and by the same token, we didn't create any more masterpieces. We made people more efficient, but we didn't, wipe out the whole art of writing. But at the same token, it's not like we created more Shakespeare's and Mark Twain's and, master authors with the creation of.

And I think the AI is kind of the same way.

Richie Cotton: Okay, so, you're saying it's basically, it's mostly about increasing productivity rather than either replacing people or necessarily going for increased quality of content.

Jamie Lerner: I mean, that's the way it looks now. I mean, in our own business, we use AI to help our engineers write code. We use AI to answer frequently asked support questions. But I'm not rolling off engineers. I'm not rolling off support people because of AI. We make decisions based on, business matters. But AI is so efficient.

I don't need my team anymore. I don't, I don't see that at all. It's just, hey, I need, everybody needs their people to be more productive.

Richie Cotton: I mean, it certainly sounds like a lot of the use cases here are things where you couldn't quite afford to employ humans for these tasks. So you mentioned, for example, just calculating how much time sponsors logos appear on sports ground. And that sounds like an incredibly tedious job for a human but AI can do it for almost free, then it's something that's a, that's a bonus.

Jamie Lerner: Well, there's a whole nother world of use cases that no human could do, tedious or not. So we work also in the life sciences world and we have different customers have different use cases, but they literally will have billions of gene interactions or protein interactions, cellular reproduction cellular interactions.

And they will use the AI based on this body of, they might have a hundred thousand interactions and they go, I want the AI to do 25 billion trial and errors. What would happen in this case? What about this case? Just there's going to be some scenario where I can set up a set of proteins and cancer cells die.

And I don't know what it is, but I'm going to try billions of interactions to get the ones that. And like, there just isn't enough. Time enough humans, enough Petri dishes to do that many billions. And there are breakthroughs that are happening on our equipment right now where people are doing, iterations and experimentations of a scale that just was never possible in Petri dishes or in physical laboratories.

So they can just run simulation, simulation, simulation. That just is impossible physically. And kind of an example, it's really to get your arms around is. It's impossible to drive millions of laps in a Formula 1 car, Just to get the drivers, the cars, the cars are so damn expensive, you're not allowed to drive that many laps.

So you can drive millions of laps, I think they limit it to some degree in the simulators, but you can use simulators To get science done that you couldn't get done at a track, or get done with a physical microscope, or get done in a, Petri dish, no matter how many humans you lined up, and there are a lot of breakthroughs that are happening through, you know, just the speed at which you can do computer simulation and just run experiments in the billions very quickly.

Richie Cotton: That does seem incredibly important that maybe if you want to do experimentation cheaply, quickly, and at a scale where you can't do it in the physical world, then you do need to have some kind of, you know, simulation or just a

Jamie Lerner: Yeah, you need a big enough sample of the real world that you can teach. The AI or your, your algorithms, how the real world works, then they can go run with it and be like, okay, now I can run many more simulations. So you, do need enough real world stuff to train. But once trained, you can start trying lots of different things. and training model has to be built by humans, but once trained, it can really run and run, you know, You know, whether it is business scenarios, military scenarios, race car scenarios, gene sequencing and gene splitting and, and how did genes behave over time how can they be modified?

I mean, it's just, it's quite incredible what once a model is trained, what it can go off and do. And I think a lot of that is not a replacement for human work. It's human work that you just, you know, There aren't enough humans to practically ever do it.

Richie Cotton: lot of these use cases seem incredibly positive. Certainly all those life sciences breakthroughs and things like being able to do engineering much cheaper. It's amazing. You did touch on one slightly controversial use case earlier. So you mentioned the idea of just seeing which celebrities are visiting some event and this idea of having video surveillance that sort of AI picking up people's faces.

That is a lot more controversial. Do you want to talk me through what the issues are there?

Jamie Lerner: I'm no one's ethicist, you know, I'm not an ethics guy as a technologist what I can tell you is, I'll walk through some use cases. like I said before, it's not entirely accurate. a classic example is like, if you're watching like, a European soccer game, football game, and a player's got his back to you, the facial recognition will not know who it is, but the player's name is written in giant letters, you know, exactly who it is.

And you're like, how can this facial recognition not know who this is? And you're like, well, it can't see the face. So it's not entirely accurate. But I do think what we're finding in society is cases where people have voluntarily know that anonymity is not possible. So if you're a celebrity and you sit in the front row at a big sports game, I think, you know, that you're going to be seen.

So to say, I'm upset about that video surveillance saw me. It's like, You're sitting in the, you know, you're a celebrity in the front row of an NBA game at the finals, kind of know. Similarly, for all of us, we love our anonymity, but when we pass through a border checkpoint, we are going to produce a passport that has a lot of personal information in it.

And you're going to give that to someone, and you know that you're exchanging your anonymity for the right to pass. So I think people are, not all, but some people like the convenience of, hey, scan my face, correlate it to my passport that you already have on file, and let me through the border without waiting in an hour long line.

I like that convenience, and I don't feel like I'm giving up my anonymity, because It's all inside my passport anyway, including my photo and everything, everywhere I've ever been during the last 10 years, it's all in there. So I think there's like this contract that's being created about, there are situations where either us, or if you're a professional player in a sport league and you're on the field, you kind of know you're going to be recognized.

So like, you shouldn't be upset that there's facial recognition. So I think there's a contract to say there are situations where. You are willing to exchange your face and your personal information because that is an acceptable scenario. And there's other situations that are deeply unacceptable when I'm driving to work in the morning or taking a walk down the road with my family or going on a camping trip.

I have no expectation that I am going to be facial recognized and tracked. On those personal items. So, and I think as a society, we're figuring out what are the contracts where you're willing to exchange data and what are the areas where you're completely unwilling to exchange data. And I think we are figuring that out.

I'm not on the front lines of making those decisions. I implement the technology, but the pattern I see is there are contracts in society where we're willing to exchange data and there's contracts where we're not. And different countries have different points of view on that. And you can see in certain European nations, they're very strict and willing to take on more security risk to have less cameras.

Whereas certain Asian nations and more and more the case here in America, you got cameras just everywhere all the time and, different societies are looking at it differently.

Richie Cotton: Absolutely. And I like that there's that range of different situations where anonymity is more important or less important. So if you're a celebrity and you're trying to seek fame, then obviously anonymity is a bad thing. But just on a day to day basis, you know, you're driving to work, whatever, then you probably do want a bit more anonymity.

And then there are some cultural factors involved in that as to how much anonymity is appropriate in your society. Okay. All right. So, moving on from the use cases I'd like to talk a bit about the technology. It seems like, with AI for video, you need vast amounts of computing, vast amounts of storage.

The infrastructure is kind of a challenge. Can you talk me through what's involved here?

Jamie Lerner: Most of the storage architectures, you tend to end up with two big buckets. You end up with one bucket, which is all your data in those buckets are getting a lot bigger because people are figuring out those who kept every bite of their data have a big advantage over those who threw theirs out.

And there's a lot of companies that said, Hey, if data is over five years old, seven years old, 10 years over a certain age, we throw it out and there's certain companies said, let's just, let's keep it all just in case. Thanks. And those who kept it all just have an advantage because anybody can analyze data on the internet and you're on an equal footing with everyone else.

You like, you can't have competitive advantage because you analyze the internet data really any differently. But if you have a huge stockpile of, say, every single shopping experience that ever happened in one of your stores, Ever for the last 40 years. And you can see where people frowned in your store, where people walked out angry, where people were delighted.

you have that data, it's like incredibly valuable. So I tied that to the first part of the architecture, which is a huge data lake, or just an archive where you keep your entire data archives, your library of everything that ever happened. And like any good library, if you can't find something in the library, it's no good.

So it's a library with a really wonderful catalog to be able to search that data, dice that data, prepare that data. A lot of people call that the core of data science, which is organizing, preparing, cataloging your data. So you can move it to the second bucket, which is a crazy fast bucket, like the most expensive for the highest speed.

And that's where you find insight, ? You might say, I want to see every single kick a soccer ball that's ever been made, ever been recorded. And I want to determine the characteristics that determine an accurate and powerful kick. And I'm going to look at millions of kicks. But first of all, you need to have all those kicks.

And then study them or and you can think of hundreds of scenarios where you go and study What makes this person run faster? What makes this person throw further? What makes this business technique outperform that business technique? but you got to be able to get the data out If you can't find any kicks, you can only find 50 percent of the kicks You're not going to come to the right conclusion.

So you have this giant data lake where you prepare data You Then you move it to this raging fast area where it's raging fast storage. And the main reason you need raging fast storage is so you can use that very fast compute you bought. buying fast servers isn't hard.

Filling them with GPUs isn't hard. The issue is saturating them is hard. Giving them enough data that what you paid for is running at 100%, that is hard. Usually that really expensive compute infrastructure is waiting on data. And so you're like paid for something you're not really saturating or using and it's very hard to create a data lake that can move data quickly and efficiently to the fast area, the fast area can provide that data it literally to saturate every bit of compute you have so that the money you spend on all that is being used optimally.

that's kind of what everyone's wrestling with. And usually what they're finding is they bought these giant compute infrastructures that are idle, waiting on data to be prepared, organized, fed to it so it can do its thing.

Richie Cotton: As I understand it, then the structures, you've got like loads of just raw data. That's going into your data lake, maybe a bit slower, and then once you've processed that, cleaned it up so it's ready, that should be put in the fast compute. And it seems like the storage then, that's the kicker, like getting fast storage so you can throw the data at it, at the compute, at the CPU fast enough, that's, that's going to be the key.

Jamie Lerner: Well, I mean, there's a lot of economics involved because, the reason why people throw data out is it's super expensive 100 years. I mean, if you look at what an Amazon bill would be to say, Hey, can I park Star Wars up there for 600 years? The monthly bill, it's just like, it doesn't make any sense.

So companies are now saying, if I'm going to keep data for 50 years, a hundred years, I need a real strategy to build an archive that's economic, that's safe. And I need cataloging techniques to understand that, I think people are thinking, Oh, I just throw it up in the cloud. And then everyone kind of sobered up and said, paying a monthly bill for something you're going to keep for centuries.

Is never going to make sense and then paying Additional fees to ever look at it and analyze it So if you're analyzing data taking it studying it putting it back taking it putting it back I mean the bill becomes kind of outrageous and it just ends up being the wrong architecture for like 100 year archives 200 year archives groups like the Library of Congress with their business, or groups like big movie studios, they build archives because they know that's like, that's their business.

And now I think a lot of people are turning around saying, archiving all of our corporate data is our strategic advantage. It's the very core of strategic advantage. And they're thinking about that data lake very differently.

Richie Cotton: Okay this is really interesting. So actually I got chatting to someone it was a couple of years ago from an insurance company saying they had insurance records and medical records from customers back in the 19th century, and it was on paper in a warehouse. So trying to figure out how to digitize them, what to do with them.

It sounds like you're, you or your customers are dealing with a similar problem where it's like, okay, we want to build really long term storage. Video in particular is huge. How do you deal with like this long term archiving of corporate data?

Jamie Lerner: It takes a couple characteristics in the system. One is, it's very expensive. Especially when someone says, Look, I'm going to keep data for a hundred years, and I don't know if I'm ever even going to look at it. It might just be data landfill. But I still want to keep it. You need a tier for Like, that you would say, Yeah, I want to put that in a very cold and dark and inexpensive place.

Then you have other data, hey, my more recent financial records. I want those to be pretty quickly available, in the movie world, you might have a movie that came out 50 years ago and it wasn't very successful and you just may never really go to it. Whereas, you know, the newsreels from a week ago, you want those warmer.

So you end up with these archives that have tiers and they're, they're really economic tiers. You might have one tier that's 30 cents a terabyte a month. You might have another tier that's, two or 3 a month. Mainly you have flash tears, hard drive tears, and then even colder and darker tears that might be some type of magnetic tape that lasts a very, very long time.

then you have to think about as that ages, do you move it down to the cheaper tiers? And that's a lot of the technology we've been working on is the automation that says, Hey, no, one's looked at this data for a year. Why don't we cool it down a little? And if another year goes by, let's cool it down even more.

We'll still archive it, you'll know where it is, but let's save you some money and put it somewhere really cheap. Now, it might take three to five minutes to find it again, but we can save you a lot of money than keeping it You know, 50 milliseconds away, so it's building policies when you have billions of files to store this stuff economically, because usually if you go to someone and say, wouldn't you want to keep all your data?

The answer is always yes, like sure. And then it becomes an affordability issue. I just can't afford to keep all this stuff. And quantum has really been building technology, working with large cloud providers, working with the biggest movie companies, working with groups like the Library of Congress and other foreign governments.

How do we create a two, three, four hundred year archive? And we've probably, we've been running, and probably us and IBM have been running the world's biggest archives for the last 30, 40 years for most of the world, and we've got a lot of experience in helping people build these, and it's probably becoming the hottest part of our business with AI.

Everyone thinks AI is all about buying GPUs. Well, part of it is, but a big part of it is, oh my god, I better keep everything. And a lot of companies had a data throwout policy, and now they're all moving to a keep everything policy, they have to totally change their architecture for that.

Richie Cotton: It sounds like there's kind of a horde of behavior, like you don't want to get rid of anything, but also you don't want to pay for, like, having all the, all

Jamie Lerner: want to pay, like, you know, the appropriate amount. I mean, sticking stuff you'll never look at again on flash, while it sounds cool, it's just, for most companies, it's economically out of reach. And so I guess the alternative, or maybe the other direction to having this sort of low priority data that's sort of stored on some cheap tier somewhere, Is the examples of real time analysis, like there's a lot of companies trying to persuade you that you need to be able to get answers to anything within milliseconds.

Richie Cotton: What's the approach, like, if you need to so I suppose self driving cars is one example, you want answers to, like, where you are on the road, like, within milliseconds in order to make decisions. How do you deal with that?

Jamie Lerner: That is a perfect use case for that raging fast tier that I talked about, right? That's got to sit on that super expensive tier. But a good example would be, you probably want to keep a history of what your autonomous cars did. That can cool down in your data lake. You don't need that right away. Now, yeah, I need to know where I am on a map right away.

I need to find restaurants right around me. My car had an accident. I need to notify someone. That stuff all has to be in the highest speed tier. But an archive of all the driving or, appropriately savable driving events. To keep an archive history of what you've done that can move over to the lake.

And so what a lot of people are doing is figuring out what data needs to be on the extraordinarily expensive high speed tiers and what data can be in the data lake and your data lake can still have flash disk and tape tiers. in your data lake, you can be deciding, well, where in the lake do I put it?

It may need to be in a very fast place in the lake, or it may need to be, like I said, in a dark, cold place.

Richie Cotton: Okay, yeah, I can certainly see how, like, the last few seconds where you've been driving is going to be very important for a self driving car where you were an hour ago, less important to what you're doing. So, again, it's just about creating those policies for how much compute or storage you need for any given sort of thing.

all right, so, I feel like traditionally a lot of businesses, the most have been focused on numbers as data, and the last couple of years they've gone, okay, well, actually text is data too, and video is something less. Something that they've not really thought about yet. Maybe they're just starting to think about it.

So, I mean, aside from the movie studios and the companies that have been dealing with this a long time. So, what do you need to do in order to get started working with video and image data?

Jamie Lerner: As we talked about, the first thing you need to do is have it. But like, a good example is, it's amazing with total anonymity what you can learn from video surveillance data. where do lines form? where do people let's say for example they're in your place of business, like say it's a retail environment, they're in your place Where are they smiling?

Facial recognition is really hard. Determining the 14 human emotions, not very hard. Determining a smile versus a frown, that's like 100 percent accuracy. Like, determining my face versus yours, maybe 85, 90, or maybe high 90s, depending on, the quality of the images, training, but like, smile, frown, anger, like that's pretty easy.

at, you can just sit there and go, let me look in my factory floor, like where do people get injured? Where are there inefficiencies in people walking in the factory floor? And other things like QA, like taking photographs of everything you manufacture and finding anomalies, quality problems, like being able to say, Hey, there's, this one is different than the last 500 we just made, or the whole line just changed.

Right. We were making everything the same. And then all of a sudden from this point forward, everything changed a little. And those can be microscopic changes that the human eye can't see, but the AI is going to pick up. So in. There's just so many applications, whether it's surveillance, photography, or video like data, sound still images, there's just so much in it.

And then in life sciences, you know, a lot of this, what I'm calling video, like a gene isn't that different than a video, microscopy is, Kind of short little high fidelity videos. I just see video being from surveillance to qa for quality assurance Photography. I'm seeing people use electron microscopes on their products after they're made, just taking hundreds of electron microscope photographs of their product to study if the product was made right, especially if it's like a board or a chip or some type of, silicon product.

just incredible the use cases and what is making a lot of it work is. These cameras that sit on manufacturing lines, a lot of them are like 3, like the price of high quality cameras and high quality video cameras has dropped. Like we're integrating some video cameras because they're cheaper than barcode scanners.

We used to barcode scan some of our products, now we just put a full video camera doing full motion, high fidelity video and it doesn't cost any more than barcode scanning. So I just think people are integrating video into products, services, locations, mainly so that even though they may not be able to do it now, in the future they'll be able to analyze it.

Richie Cotton: Okay, that's pretty cool. Like, a lot of these applications seem pretty wide ranging. So, like, anytime you go to a retail store, put some cameras in there, you can do customer analytics. Anytime you go to a warehouse, you can improve the logistics. That's pretty cool. You're manufacturing, you can use it for analyzing quality control.

Jamie Lerner: I mean, a lot of people are layering this stuff up. I mean, we have a lot of customers in stadiums and they have video surveillance because of insurance reasons and whatnot. They have to have it. But they said, well, if I've got the video, why don't I study where the lines form at my bathrooms? Like what bathrooms have big lines, which don't, how do we fix that to get a better experience?

Because it's sort of like video surveillance has always been a tax. Something I have to do, because my insurance company says I have to, my, the public area I'm in says I have to have this. But then you're like, hey, I can take this data, make it totally anonymously, just determine, without knowing who my shoppers are, I can start learning about how are things going inside my place of business, and actually get some value out of my surveillance.

I see people doing that all the time.

Richie Cotton: Okay, that's kind of cool that it's gone from a cost to being something with a positive return on investment. Alright, so, it sounds like quite a few teams need to be involved in this. Who tends to be in charge of a video analytics program?

Jamie Lerner: It differs, I think it's too trivial to say it's IT. I think in video surveillance, sometimes it's, it's more around the facilities team and security team. in the movie world, it's more in the world of producers. And the studio heads than necessarily IT all the time.

So I think in corporations, it tends to be squarely under IT, and then I think in, in life sciences companies, very, very many times it's in research. It's really in the core that the AI engine is, considered part of a microscope, like the microscope, the images it produces and the analysis of those images is.

is all part of a microscope system, or in a manufacturing shop floor, the photographs, even if those photographs are done by electron microscopes, the AI at the end of it to say, did we produce a quality product that can be part of the line. So it's pressing out of I feel a lot of A. I. s, especially as it relates to images and video, are pressing beyond the boundaries of I.

T. They're pressed into other parts of the business.

Richie Cotton: Okay, so it's quite often in commercial teams or perhaps sort of end user teams rather than just these sort of core business functionality uh, teams.

Jamie Lerner: And it's funny, it's been going on for like 20 years, because I go to companies all the time, and you will not believe this, where the I. T. department says, wait a second, You have 10 petabytes of video surveillance data in the security shack? And they're like, yeah. And they're like, we don't have 10 petabytes of data in this entire bank.

And they're like, yeah, we got, petabytes and it's only the last 30 days of our video. We delete the rest because we can't keep it. And they're like, You have 10, like, it's just staggering. It's the same thing in, in like a movie studio. a movie studio will have an exabyte of footage and the corporate IT department with their email system and their document management, the basic parts of running, they might have a petabyte and they're just like we shoot a petabyte every couple of hours.

Like on a movie set, like what are you talking about? So usually when the big unstructured data, it tends to be, for quite some time, it's really been outside of IT. And what the funny part is, is these new companies coming along saying, I need someone who can handle 500 petabytes of data in a data lake, know how to analyze it.

Where do I find this person? They're not in IT. If you look at the people running these big AI workloads, Few of them come out of IT. They usually are like, out of a movie studio. Because they're like, hey, we always have 500 plus petabytes at Disney, Warner Brothers, Paramount, you know, big shops. They're like, we have hundreds of petabytes and we deal with it every day.

And sure, we're doing high speed analytics. What do you think visual effects is? You know, what do you think special effects are, they've just been doing it so long. Same with the video surveillance people that a lot of those people from video surveillance, from video backgrounds are moving into life sciences analytics, because what they're finding in life sciences is they're managing lots of videos of cells and their architecture starts to look a lot like A studio.

Richie Cotton: That's absolutely fascinating that there's a link between. movie studios and life sciences.

Jamie Lerner: They're very similar, architecturally.

Richie Cotton: okay. Yeah, so it sounds like if you want to cut your teeth on some big data sets, then getting into working with video is a good place to start.

Jamie Lerner: Yeah.

Richie Cotton: Are there any particular skills you think people need in order to get started working with video?

Jamie Lerner: Depends what side you're on, right? If you're a video editor, or kind of infrastructure, we're really less on the creative side of video, more on how do you store it, manage it, keep it. And I think the core skill set, the most important thing, is, and data science is broad word, I would call it data catalogs.

Thanks. and the art of metadata. So you have a file. Okay. You have a file that could be a cat scan of someone's brain. So you call it cat scan and give it a number. That's not much, but metadata allows you to fill it out to say, Oh, yes, it's a cat scan. It's of a male. It's a male NFL player. That's had these conditions from serious head injuries.

this guy also, his family has a history of these cases, like you start filling it out, now all of a sudden that data's rich. It's not just, oh, we got a CAT scan CAT scan 5279, it's, oh, it's all this information that comes with it that you can then begin to query it and say, give me all CAT scans of NFL players that had over 10 years in the NFL or over 10 years with some type of exposure to head injury.

That have this history in their family. Then you can begin to organize your data. That's the key. If you can build a catalog like that, your data is incredibly valuable. I can tell you, I've been on movie sets where they're looking for a three second clip of a lightsaber or something from the seventies or eighties.

and I'm not exaggerating. It's named saber dash two Oh seven. And it can take someone four to five months. in a haystack of a billion files from a huge movie franchise, trying to find that one thing we shot 30 years ago, it's staggering to see how hard that is. And I think the cataloging and organizing, cleaning of data is what differentiates the companies that are going to gain incredible insight from the data and the companies that are just going to be lost.

Richie Cotton: Absolutely. So, I mean, you talked at the start of this just about being able to automate tasks that are kind of tedious for humans. I have to say, like, spending five months just staring at a Star Wars to try and find a three second shot of a lightsaber.

Jamie Lerner: videos. Yeah, I mean, you see it in sports all the time. We need that slam dunk that happened this year from that angle. Remember that one? Like, then people start scrolling around looking for it, Whereas with metadata and facial recognition, I need this player doing this action roughly in this time frame from this angle.

And if you have the right catalog, it just goes right in and gets it. If you have no catalog, can you imagine just walking around a library just filled with millions of books and there's no card catalog, no Dewey Decimal System, nothing. Just walk around and look at books until you find your book.

I mean, it sounds insane, but that's what a file system is. It's a bunch of books with a name, millions of names just thrown into a big well, and without a metadata catalog, it's just lost. And the companies that get that archiving metadata organization right, they're just gonna blow away their competition in the AI game.

Richie Cotton: All right that sounded like incredible advice. And actually, we've had a few Data Framed episodes on sort of data cataloging, data discovery. I think some homework for all our listeners to go back and listen to those episodes. All right, so, is there anything that you are most excited about in the world of AI and video analytics at the moment?

Jamie Lerner: I mean, you can probably tell I'm excited by it all. You know, I, maybe we should just talk about excitement because I, I think that's the right word. Because I, I think there's a camp that, has excitement and there's a camp that has like fear, uncertainty, doubt, and we've invented a lot of things as human beings.

And we've, for the most part, figured out how to integrate them in a healthy way. There's a few, I mean, there's a number of exceptions, but for the most part, we figure out how to coexist with new technologies and we figure it out. And I think that's what's going to happen here. And I think the benefits, the excitement, the things, I mean, I've talked about so many different use cases that are so interesting and we're really just at the beginning.

I, I just think it is going to be awesome for everyone in society. And I think, there's a lot of fear, a lot of worry, and I think that fear and worry will get resolved as we just work on how will we use it, what's acceptable, what's not acceptable, what will we allow, not allow, how do we manage that, we'll figure it out.

we figured out how to drive extremely heavy cars next to each other. Like we figured out a system, it still has danger, but we figured out a way to integrate it in society so we can use automobiles. You know, we figured out how to use the internet and I think we'll figure out how to use AI in a socially acceptable way.

I'm definitely in the excitement camp and not in the, we got to regulate this and shut it down and manage it. Right. We're not anywhere near that. And if you look at the accuracy of these algorithms, I can tell everyone, you don't have to be that worried that these machines are replacing anyone.

This stuff is not that accurate yet. have Chad GPT write something for you, you know, maybe it'll write you a nice contract, but. other things, you can just tell this wasn't written by a human, so we're not, being replaced anytime soon.

Richie Cotton: Okay, and just to wrap up do you have any final advice for organizations wanting to make better use of the video that they have?

Jamie Lerner: just, I feel so strongly, like, keep every single bit and byte of your data. If it's not this year, it's not next year, if it's not next, it could be next decade, could be next century. But at some point, someone will figure out how to learn from that data and use that data. And the only data you can't learn from is the data you deleted.

So I do think archive strategies, data preservation strategies are critical. I think every company needs one. And I do think you got to really think about if that data is small, I think you have a lot of options for the people who have bigger data sets. Keeping that data for a century, several centuries, I think that takes some thought and I do think it's worth talking to the people who've been doing that and that's, our governments, branches of the government, the big movie studios and the big medical records companies have been figuring out how to keep medical records for your whole lifetime.

And there's a bunch of companies that have experience keeping data for a hundred years, and there's a lot to learn from them, but I would say probably. Over 90 percent of the companies outside of media and entertainment have no data preservation architecture strategy.

Richie Cotton: Absolutely. And I have to say, working at a small company, like we've sort of just grown out of being a startup, thinking more than a couple of quarters ahead, that feels like a long time. Planning on what you do with your data in a hundred years time, that's absolutely incredible. But it's something that's definitely worth thinking about.

So,

Jamie Lerner: I mean, it sounds crazy, but you look at Disney, a lot of their first stuff's kind of coming up on a hundred years.

Richie Cotton: Yeah, absolutely. I mean, those sort of early Mickey Mouse things.

Jamie Lerner: And our government, I mean, the Library of Congress is like a hundred years. We're way past that. they keep documents much older than that.

And if you go to Europe, they're like 200 years. That's nothing. we've been keeping documents for 800. So, it sounds fast, but it, not really. again, look at Disney. I mean, Mickey Mouse and those first things. I mean, that's coming up on a hundred years.

can you imagine if they said, yeah, we just chucked Mickey Mouse. We just threw it out. Yeah. Yeah. The first black and white movies. Yeah. We threw all that stuff out. We throw everything out after seven years or 10 years, which is a policy in a lot of financial groups, right? After it's 10 years, we can dispose of it.

It's like kind of unthinkable.

Richie Cotton: Yeah, so there was a big controversy with the BBC, with the BBC throwing out Doctor Who episodes and some of those early episodes, but it's lost. So, uh, yeah. Keeping this stuff is incredibly important.

Jamie Lerner: Yeah.

Richie Cotton: Alright, super. Yeah, this has been brilliant. And I think yeah, homework for everyone to think about, like, what is your long term plan for storing all your data?

So, yeah, thank you so much for this, Jamie. It was great chatting with you.

Jamie Lerner: for having me. It's good to talk to you

Topics

Data Engineering

AI for Business

Cloud

podcast

Aligning AI with Enterprise Strategy with Leon Gordon, CEO at Onyx Data

Adel and Leon explore aligning AI with business strategy, enterprise AI-agents, AI and data governance, data-driven decision making, key skills for cross-functional teams, AI for automation and augmentation, privacy and AI, and much more.

podcast

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.

podcast

Data & AI for Improving Patient Outcomes with Terry Myerson, CEO at Truveta

Richie and Terry explore the current state of health records, data challenges including privacy and accessibility, data silos and fragmentation, AI and NLP for fragmented data, regulatory grade AI, the future of healthcare and much more.

podcast

What Fortune 1000 Executives Believe about Data & AI in 2024 with Randy Bean, Innovation Fellow, Data Strategy, Wavestone

Randy and Richie explore the 2024 Data and AI Leadership Executive Survey, the impact of generative AI in 2023 and what to expect from it in 2024.

podcast

[AI and the Modern Data Stack] How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence and lakehouse technology, how AI tools are changing data democratization, the challenges of data governance and management and how Databricks can help, the changing jobs in data and AI, and much more.

podcast

From BI to AI with Nick Magnuson, Head of AI at Qlik

RIchie and Nick explore what Qlik offers, including products like Sense and Staige, use cases of generative AI, advice on data privacy and security when using AI, data quality and its effect on the success of AI tools, how data roles are changing, and much more.

See More See More