Skip to main content
HomePodcastsData Science

How the Aviation Industry Leverages Data Science

Derek Cedillo is a Senior Manager with over 25 years working in data at GE Aerospace, in the episode he shares the key components to successfully managing data science program within a large and highly regulated organization.
Mar 2023

Photo of Derek Cedillo
Guest
Derek Cedillo

Derek Cedillo has 27 years of experience working in Data and Analytics at GE Aerospace. Derek currently works as a Senior Manager for GE Aerospace’s Remote Monitoring and Diagnostics division, having previously worked as the Senior Director for Data Science and Analytics.


Photo of Adel Nehme
Host
Adel Nehme

Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.

Key Takeaways

1

Data team structure is one way to set yourself up for success—be as flexible as possible. Ideally, pick a team consisting of experts with their own specialization, all of whom can do each other's roles to a basic degree. This helps create a continuous learning environment and allows you to pivot quickly if needed. 

2

To successfully manage a Data Science program, ensure that you’re always trying to solve the business problem the right way, use high-quality data where possible, create single sources of truth that are stewarded by data teams, and, if applicable, create a data lake to centralize your various data sources. 

3

Even if you don’t work in a highly regulated industry, you can approach the planning and delivery of projects like you do - it can be as simple as creating a checklist for the requirements and delivery variables for your Data projects. 

Key Quotes

There's a good book out there called The Checklist Manifesto, in the book there's one example that's tied to the aviation industry that says, “I don't have time for a checklist.” And it says, well, if an airline pilot in emergency has time for a checklist, then why don't you? And I think that's really important. I think sometimes people bypass even the concept of a simple “Look, I trust you to do your job, but we all forget things every day.” And so the ability to just bring a very simplified checklist, not a thousand line item kind of issue, but 10 or 15 tasks that get to the spirit of what you're trying to do is really valuable. Have you had the right conversations? Have you validated that the problem you're solving is the right problem to solve? Have you made sure that you’re production ready? Regardless of the project, you can still follow the same process to ensure you've got a quality product at the end.

All too often, and especially in an area of data science where it's such a new emerging technology, someone often is looking to solve a problem that they don't even know they're truly trying to solve. I want lower cost. I want more efficiency. Things of that nature is usually what an executive will ask. How can the data science team help? There are many, many different ways to help them solve that problem. So spend the time upfront to really establish the universal understanding. I understand what you're asking of me, and you understand what I'm capable of doing. I think that's really the key to getting the problem started off right.

Links From The Show

Transcript

Adel Nehme:
Derek, hey there, it's great to have you on the show.

Derek Cedillo:
Thank you for having me, I very much appreciate it.

Adel Nehme:
I'm excited to speak with you about how you lead data science at GE Aviation. How you approach leading data science programs, How to operate a data team in a highly regulated environment, and much more. But before, maybe give us a bit about your background and how you got into the data space.

Derek Cedillo:
Yeah, so I graduated back in the 90s from MIT as a mechanical engineer. And so I didn't have a big mindset for data, but I took a role at GE. And within the first couple of years, I found myself in a particular role that happened to be programming in Fortran 77 and working with data directly from aircraft. And I found just that combination of programming, statistics, and my love of aviation all came together perfectly. I've really made a career of that ever since in a wide variety of ways.

Adel Nehme:
That's really great. Maybe to start off today's conversation at the stage, you'd love to know the type of data science projects that you lead a GE Aviation, and how it provides value for the organization.

Derek Cedillo:
Thanks. I think it could be kind of summed up in our mission statement for our team, which is to provide business outcomes supported by data science intelligence. And I think the intelligence piece of that is really key. What we're looking for is that data science isn't the answer. Machine learning models or AI or anything else in that world does no... See more

t bring the solution. But you really have to marry it with the expertise of the domain experts. So being able to take that with them and move that forward to find the real outcomes. is where it happens the best. And we have analytics all over the place. We do everything from trying to detect the delta P of fuel pressure on an aircraf to say, hey, you need an oil filter change, to going and analyzing cost data throughout the marketplace in our MRO Services Network.

Adel Nehme:
Okay, and there's definitely a lot of value there across the value chain. As you mention, what I want to really focus on today's conversation is how if you approach kind of managing complex data science programs, especially in the type of environment that you operate in, I think there's a lot of value for listeners, especially to data leaders listening in today and trying to pick out insights from your experience, leading data science and aviation and a large matrix organization like G. You know, creating value with data science is inherently complex regardless of where you are, and I think when it's done in a large organization, and also in a highly regulated environment, this complexity is only exacerbated, so maybe to start off, I'd love to know what you think are the game components for successfully managing a data science program.

Derek Cedillo:
I think it really starts with solving the right problem. All too often, and especially in an area of data science where it's such a new emerging technology, someone often is looking to solve a problem that they don't even know they're truly trying to solve. I want lower cost. I want more efficiency. Things of that nature is usually what an executive will come at to say, how can the data science team help? And there's many, many different ways to help them solve that problem. So spending the time upfront to really establish like the term universal understanding. I understand what you're asking of me, and you understand what I'm capable of doing. And I think that's really the key to getting the problem started off right. In addition to that, really having high-quality data at its source, being able to make sure you can rely on it, garbage in, garbage out with all models. So that always becomes an issue. And then especially, you kind of mentioned the large matrixed organizations, single sources of the truth is a real challenge for a company of our size. Everyone has their day jobs and they've got to be tactical in what they're trying to solve throughout the business, whether it's running stuff in the supply chain or it's something in a services shop, et cetera. So there's a need for them to have it right tactically every day. And that's very counter to being able to build a complete data thread and have consistent data throughout the entire enterprise so that you can really tie it together, whether you're the case may be. So I think that's a big part of it. And then obviously if you can get that right, then the data governance and stewardship to maintain those single sources of truth. So we've gone through a lot of digital transformation over the years. In order to do that, you know, the creation of a data lake is one big step that companies can take to heading in that direction. But keeping it from being polluted is a whole other situation after you create it.

Adel Nehme:
So let's look at some of what you outlined above. I love how you kind of categorize different, you know, components of successful program management. Starting off with solving the right problem. I think this is actually a big problem in the industry that a lot of times, especially organization starting out in their data journey tend to focus on problems that don't necessarily provide business value. How do you ensure as a data leader that you're always solving problems that provide value for the business?

Derek Cedillo:
It's really about engagement and transparency. So, you know, oftentimes in this space, again, you'll be challenged by an executive as a, say, a Junior Data Scientist, say, I need you to go do this, and be able to have that transparency to say, I can't do that, the data isn't of the right quality for me to build a model from. Or you want 99.95% accuracy, you know, whatever your accuracy metric is, and I can't provide that. I might be 60 or 70% accurate. there's a lot of conversation that has to happen around is that the right problem. Because again, there is typically a very clear financial metric, for example, that would be stated as this is the problem. But how you go solve that is very complex. It's no different than designing a jet engine and saying, well, I want to push this airplane so it rises up through the air. There's a clear need to go do that, but how you go do that, the devil's in the details. So making sure you have that upfront conversation, one conversation, it's a series of conversations. We use a process called the TRL-MRL process and it's very first step, the TRL-1 is all about what is your business value, what are your metrics for success, what's a rough outline of how you're going to go attack this problem, and who are your team members. And those team members should represent both sides of the stakeholders as well as the data science team.

Adel Nehme:
That's great, and maybe expanding on the process. What are the types of conversations that you have there In the process you mentioned here? Maybe you know outlining ownership. What is the business value? How do you kind of standardise that across a different variety of projects that could have different types of data sources. Value drivers, I'd love to learn more about that.

Derek Cedillo:
So that's a huge challenge. The key word in the middle of all that is standardized. That is a huge challenge. So when you're trying to solve an analytics or data science problem on a jet engine is different than the accuracy and the validation you need when you're trying to plan labor hours for a team. So that standardization and how much rigor do you put in the validation is always a factor. And that is a process that we actually adopted from NASA and the Department of Defense a couple years back, which is totally unrelated to data science, but it's talking about bringing new technologies to the market, right? Rocket ships and defense articles and things of that nature. And it's all about the risk reduction of actually getting to the end of making those products work. And we hijack it because you're really not manufacturing a data science model per se, but eventually you're going to put it in some kind of production environment. So what we try to do, I mentioned the TR-01 process up front of being able to establish, what is the need, are we solving the right problem? But there's also tidbits in there of do we already have something that solves this problem in our catalog? And should we be reevaluating that because maybe that's ineffective or there's a lack And then there's other multiple handoffs so that we don't end up building an analytic that we can't put into production because again perhaps what we talked about earlier about individual sources of the truth something's not repeatable, etc that whatever we validated on can only be validated on a desktop and can't run into a runtime production environment

Adel Nehme:
So you mentioned here at the end the single source of truth, I'd love to kind of dive deep into that as well. You mentioned the importance of data as well as high-quality data, a good data governance program and easily accessible data. How have you approached that journey of creating that single source of truth that is high quality and easily accessible?

Derek Cedillo:
Yeah, it's a challenge for sure. And I think, I wish I could say, I've attacked it strongly and been able to go lead that. I rely on much of the rest of our organization to go do that. So we have our own network chief information officer, for example, that is driving something called the event data thread, which is meant to address each of those kind of touch points of the data from its initial creation at the time cycle, including the servicing, the repair of that engine, et cetera, until it's retirement. And so that's really not on my team, but we're an advisor to that team, right? So trying to help them to understand where the difficulties will lie when you try to model it, depending on even how you do that. So you can have a good data governance policy or some keys in the data that link everything up the wrong way. So it's really about just being partners with almost everybody. You know, we've got to be partners with our stakeholders, we've got to be partners with our data teams that actually own the data infrastructure and architectures, and then we have to be partners with our peers that keep that stuff running as well. So it's really about being a team player in order to make that all work.

Adel Nehme:
That's where, And you know you mentioned as well like earlier, beyond the single source of truth, But how we approached creating value with data science. The TRL/MRL process maybe walks through as well the project management approach to actually delivering data science projects, you know, unlike software engineering you tend to see in the data science space, quite a few different variations, project management approaches. Why do you think that is the case, and what you think is a great model for me, approaching data, science projects end to end.

Derek Cedillo:
Yeah, I think it's because this area is so new, and we don't know how to apply it. I mean, if you look at the number of universities that offer data science degrees, it's measured in the dozens, not necessarily in the hundreds or the thousands, like mechanical engineering degrees might be. So I think that's part of it. It hasn't been studied quite as much. Same thing. We've been writing software for a really, really long time. But even the term data science wasn't So I think that's part of it and we have to be flexible on the way that we do it. So I mentioned the TRL/MRL process and that's really kind of checklists and things of that nature to make sure you haven't forgotten to do things and you're going to get across the finish line solving all the right problems and actually being able to put it into production. But you still have to manage a team. We're still teams of people that are doing this work. And so, you know, is Agile methodology one of the right ways to project manage a team. Again, there's just so many ways to do it. We're currently in the middle of an Agile transformation, really applying Agile software principles to our data science projects. Literally just getting started in it. We've been doing it for about a month. and found that we could actually apply the software principles pretty well. The difference that we face, I think, in the biggest challenge in making agile work is especially on a lot of the data science projects, they tend to be run by one or two people, very, very small teams, compared to software development, which even though software development may work in small pods of, you know, say five or six, a team of one is very hard to go get a scrum master for. So how do you get an individual data scientist to think with an agile mindset and plan their day and run in sprints and do retrospectives kind of all by themselves without other team members is really the challenge in kind of bringing it together. And I think across the industry, a lot of data scientists really start off and kind of one off until a project gets really big, more run by single data scientists than they are by large groups until it really gets into be a production process.

Adel Nehme:
That's great. And you mentioned that you know skill set component of how do you get that data scientist to be? You know both technical, but also have that business acumen of necessarily project management and approach projects you know, kind of teases into my next set of questions on how to structure teams around value creation. right, You know, a big part of providing value with data scientist, structuring teams around effective data science by practice is making sure that they're aligned with the business value. I'd love to focus on how you've approached is well, structuring the different data teams you lead. So I would love to start by understanding what do you think is an ideal data team structure?

Derek Cedillo:
I think it's not a great answer to start with. It totally depends, but it does totally depend. As I mentioned, I work in a relatively large company, and so we have an entire digital technology team that I can turn to them and say, hey, we need this kind of infrastructure. In many cases, data science teams are Swiss Army knives because they are the whole digital technology department for their team almost based on the support structure, but I think the keys of being flexible don't change. So I'm a huge fan of the book by Stanley McChrystal called Team of Teams. And it's all about, you know, the real short version- and I highly recommend go buy his book and read it- but the super short version is that you want deep experts on your team, but you want everyone to be able to do all of the jobs. So when I look at composing a data science team, that's exactly how I look at it. Spread out throughout the team. So I've got someone really strong in forecasting, someone strong in operations research, someone strong in visualization with BI, and then someone who's a really strong data wrangler. I've got another person, for example, who's really great at image analytics. But they're all specialists in that field, but they can also do a little bit of everyone else's jobs. And therefore, we can pivot a lot, and we can And it also basically creates a continuous learning environment where everyone gets to pick up and we use things like hackathons within our own teams to create artificial problems. For example, we just have one on image analytics so that everyone on the team could learn a little bit more about image analytics, so at least they're conversant in case whatever stakeholder they're talking to might happen to have an image analytics problem So it's- you mentioned the, you know, all the different skills they can have and that's great. Those are the unicorns to go find someone who can project manage themselves and be a great data scientist and a great data wrangler. Those are really, really hard to find, but the teams of teams approach helps you balance that out in the interim while you continue to find those all stars that come in.

Adel Nehme:
That connects to my follow-up question, because indeed, having that data scientist with the multiple skill sets that we described is a unicorn data scientist is very difficult to find. Maybe walk us through your teams of teams approach. How did you some of the best practices that you found and developing a pipe line for recruiting data talent. How do you test for these different types of profiles and interview process?

Derek Cedillo:
Yeah, again, another huge challenge. I just sound like my job is super challenging. It is, but that's what makes it fun. But especially in some specialized industries, like the Aviation industry, it can be a real challenge to recruit the right kind of talent. I mentioned all the channels we have with our data structure. Part of it's because of being in a regulated industry, and part of it's not. So when you're competing with banks and retail that have gobs and gobs of data thanks to barcoding processing, it becomes difficult to find someone who's already more excited about ones and zeros than they are about aircraft engines. And so you have to be really active in the recruiting process. So we partner with universities, universities here in our headquarters is one that we partner with and we help sponsor their analytics program there. And we'll give graduate students opportunities to work on problems, we let our team to go ahead and get master's degrees, PhDs, other continuous development opportunities, where they network within the university system. And then it's just being extremely proactive in seeking talent. Because especially with such an emerging space, the best data scientist you may find may not have a data science degree. And so finding someone who maybe like myself early on and the programming that went with it, but has been classically trained as an engineer, might be someone you keep an eye out for and develop that love that they have rather than seeking out someone directly from school. Because as I mentioned, there's not that many programs today out there that you're gonna get someone with a bona fide data science degree coming out right now.

Adel Nehme:
Okay, and kind of you mentioned here you know partner different universities, creating that you know upskilling pathways for your own data scientists as well walk me through, maybe the type of technical skills that make a data team successful and you're looking on your data team.

Derek Cedillo:
Yeah, I think bare minimum is just understanding databases. I mean, there's a couple of key things that you got to have to get in. One is understanding database and database structures. You got to know how to query data and be able to pivot it, see it that goes kind of things. Python skills are a must. Five years ago, there was a huge debate over, are we Python or are in the data science world, and what should we go do? And we had a raging debate within our own office over who should do what. I would say for the large part, to the point that you've seen RStudio change their name to no longer be RStudio, but to be renamed as Posit, to get away from feeling on the losing team. And so that really helps, especially we talked about getting analytics into production. Python is just a more efficient programming language from that perspective. And that's why it kind of won out, not to mention the fact that it caught up with all of the statistical tools like Scikit and stuff like that, that would really help folks out, SimPy being another one. But I think that's really huge, so any kind of program skill. Although, of course, if someone is strong in R, those skills to Python. You just got to spend some time doing it. But a detailed programming language. And then I think, you know, BI, business intelligence visualization is super important. You got to be able to tell your story. Many of our projects start out, like I said, with, hey, I have a problem and I don't even know how to solve it. And the first thing we do through our EDA, our exploratory data analysis, is we build a spot-fire dashboard for BI and we start combing through their data and thinking about the statistical relationships, thinking about what we could work with and sometimes they're just floored by that and they say like stop, like I've never even been able to see this data before. So being good at exploratory data analysis and being able to do that storytelling piece is very, very important to one, producing a good product, but for two, even going back sure you got the right problem and you're solving the right problem with the right data.

Adel Nehme:
Okay, that's really great And you know we talked about the technical problem. Technical skills here, maybe walk us, though as well the soft skills because you mentioned story telling ability to tell stories with data. I think that's so foundational to be able to succeed with data science. What do you think are important? Soft skills outside of story telling that our must have skill sets on a data team.

Derek Cedillo:
Right? Yeah, so the storytelling, as we mentioned, is really huge, but CAP, or Change Acceleration Process, is another one. And like I mentioned, one of the things that you run into is everybody wants a 99.9 percent model. Very few models work that way. So being able to convince them of how to use an imperfect model. You know, there's a quote by George Box that 'all models are wrong and some are useful' is something that the data scientist has to fully understand So it's not easy, change acceleration process is something you need to really work at, but something like that that helps give you guidelines to think about how do I move detractors to neutral and neutral folks to supporters of me is a big thing. Transparency is critical, I mentioned, we don't always come up with a good model. I like to advertise that 30% of my projects fail and I'm not ashamed of it. And what I'll tell people is the reason that they fail is because if not, we're not trying of hard enough problems. And so, but that takes a certain amount of transparency for everyone to kind of be open with, right? Again, that universal understanding. I understand what you want from me, but I can't provide that. And so that recognition of being able to build that reputation within your team, right? Becomes really huge. You know, data scientists can change the world. You want to be seen as that go-to team and transparency is a big part of that from a soft skill. And then we did talk about the project management a little bit already. Are you able to just basically deliver on time. So we're working on our agile transformation to try to even speed up what we do. But again, since many times your stakeholder is trying to find a 99.9% model, they'll ask you to keep going until you get there. You've got to be able to manage both their expectations on your own project to say, look, it gets exponential, in the amount of effort I need to put in to keep moving my accuracy higher. Will I communicate that and program-manage to deliver an outcome and hit, you know, major on the majors and minor on the minors. So maybe the 80% accuracy answer is right to get a big outcome here and you can go chase the 99 but there's probably another project you could go kick off and even get partially right that has a bigger impact to the business. So that's where the program management kind of meets the transparency in the middle.

Adel Nehme:
I love that, especially on the transparency and the change of acceleration process. You know a lot of organization sent to have especially large organization and to have like a data translator, project manager, program manager role within data teams, Maybe walk me through. How useful do you find that role especially within large teams to be able to translate, and kind of be that middle man or middle person between a business function in a data team?

Derek Cedillo:
Yeah, and I think that's huge. Sometimes there is, so we have a role called an ML/AI Program Manager, which would be similar. And that's someone who would lead teams, if we're working a relatively large project that maybe needs five or six people to go work on. And they'll help coordinate that. But again, as far as I look at it, I try to train everybody up to have some element of that data translator themselves. So I mentioned a lot of data science projects tend to be singularly worked, you know, only one person and maybe partnering with somebody else where they run into trouble and so I think the industry is trying to find what that right balance is. You know, do you have a translator or program manager that can do that and it really depends on the skills of your team and and the breadth of the people working on the projects.

Adel Nehme:
That's really great. And you know you mentioned here as well, like having a team of teams, and like developing that skill set within the data scientists. I find as well you know, the ability to turn the detractors into champions also is easier when you have a strong data culture within the organization and you have executive buying, and everyone is really aligned with the data science agenda and the data agenda and the data culture. Maybe walk me through the onus that the data leader has here in developing that data culture?

Derek Cedillo:
Yeah, that's a huge question in and of itself. But, you know, because I can say I've seen it over time. I've been at this company for 27 years. So I've seen us over time going from recognizing digital is important, but having no idea what it means to digital is important. Why can't the teams get it right to, you know, digital is important and forecasting is key. Wow, our data is a mess. And maybe we need to go clean it up a little bit and then taking that to the next step. Definitely that high level leadership from the business that you have to get on board to recognize it. And it's a journey. It's a journey for every company, the larger the company, maybe the longer the journey. But I think the key that you hit on there is whose responsibility is it. And I take that on as partially my responsibility. Sorry, I'll have to pause this out. But as you kind of go into a data science team, it gets to really be in the middle of it all. As a data science team, you're often in the middle because you're getting multiple requests from multiple leadership, you also are dealing with the day-to-day of the data underneath. And so you get this unique perspective to influence leadership upwards by saying, I understand what you're asking for, but I can't get it, and this is why teams need to improve data quality so I can give you better models. And I think that's a key that kind of comes into play in how we play with and influence the teams into an overall digital strategy so that they can get that data science is not just magic pixie dust that you drop onto things, but rather needs to be an infrastructure that supports its capabilities.

Adel Nehme:
I really appreciate that. Let's shift gears a bit and talk about managing Data Science in a highly regulated environment. Right, you know, I think an added complexity of managing a data science program and an organization like GE Aviation is the regulatory environment of the airline industry. Walk us through maybe the additional complexity data leaders encounter in such an environment, and what that looks like?

Derek Cedillo:
Yes, it really gets to be a challenge and some of it goes back into even what we talked earlier with how do you get people to accept an imperfect model and understanding that but some information is better than no information. So that in and of itself is a conversation that has to happen with regulatory agencies and even just within the business that even if it's not a particular corner of the business the regulatory type practices that says it must be this way every time. So one of the things like we face in the data structures is a resistance to change sometimes, and being able to help them address that and recognize that, yes, we always have to do it in a regulatory fashion because this is how we sign things off and we're used to signing them off with paper, but recognizing that, hey, digital signatures can be accepted and traceable, et cetera. Sometimes you'll see a little more resistance, I think, in regulated agencies to change, just in general, because they're regulated. And so you have to have a little stronger scale at the change acceleration process, I think, in the future.

Adel Nehme:
Okay, that's really great insight. And you know when developing data products, what is the validation framework that you follow? You mentioned the TRL/MRL process. What are the specific checks and balances look like for deployment that are re-enforced before data product is employed.

Derek Cedillo:
Yeah, so as we mentioned that process and some of its roots, you know, taking a product all the way to finish, this again comes into some of the regulatory stuff. If we're making, if we're doing dashboarding and we're doing very simple analytics, we kind of omit certain parts of that process from a validation because the, the, uh, the outcome, if you're slightly wrong, right, it becomes into something, um, that maybe the negative impact is a phone call, right? etc. So we don't want to add all the rigor for the other projects that we're working into something that is only a few minutes of wasted time. So we still, though, in order to keep people working with standard work and in a lean mindset of let's do it the same way so things don't slip through the cracks, we have kind of exit ramps in our process to say, well, what it doesn't need to follow the full path of full validation, maybe engineering offer them some of the process to allow them to have a simpler answer. But you do run the risk of overcomplicating it for the simple projects when you're in a regulatory kind of environment.

Adel Nehme:
Okay, that's really great insight. And you know one thing that I think is interesting in working in an environment such as the environment that you work in is that you know a lot of data teams tend to operate, and you know relatively less regulated environments. Think your food delivery services your social media applications right, and I think there's a lot to learn about the structure of quality assurance and deployment best practices that are present within your industry. So maybe to frame this question in a different way, if you are to move on from your current role into Regulated environment, what quality related best practices would you take with you and apply there?

Derek Cedillo:
Yeah, I think there's another good book out there called The Checklist Manifesto, which is a really good example that kind of goes through. And there's one example that's tied to the aviation industry that says, I don't have time for a checklist. And it says, well, if an airline pilot in emergency has time for a checklist, then why don't you? And I think that's really important. I think sometimes people bypass even the concept of a simple Look, I trust you to do your job, but we all forget things every day. And so the ability to just bring very simplified checklist, not a thousand line item kind of issue, but 10 or 15s that get to the spirit of what you're trying to do to make sure have you close things off with, you know, the data structures team. Have you made sure that your production ready? Have you had the right conversations? Have you validated that the problem you're solving is the right problem to solve, order, but they're the kind of things that you'd want to make sure that, you know, basically are kind of in a checklist format, that your team makes sure they hit every time. So that level of standard work, even in a non-regulated industry, it might be less standard work. You might have less effort in validation, because the outcome of old late pizza is different than accidentally ordering $5 million of extra inventory. But you can still, you can still follow the same process to make sure you've got a quality product at the end.

Adel Nehme:
Okay, that's really awesome. Now as we wrap up Derek, I'd love to be able to learn from you, what are trends that you're looking out for in data science that you think we'll be impactful in 2023. It's definitely shaping up to be a reltively crazy ere in the space of research and the operationalization of research. I'd love to learn how you're looking at things this year.

Derek Cedillo:
Yeah, I think, you know, kind of joke, but if I see one more ChatGPT recall, I'm going to be a little tired of it. But it goes to show you how exciting it is for everybody, right? I mean, people are looking at this and they don't know what to do about it. They're like, wow, this is fantastic. And again, it goes back to some of that transparency. If you start looking underneath, you start to see how much hand-holding there might be in some of these projects and stuff like that. Pure AI or pure machine learning, there's a lot of preparation that goes into that. And so I'm really interested in seeing us getting better in overall transparency around those things and recognizing the two points of it of, again, not letting it come across this magic pixie dust that you could just drop on anything and automatically it solves so that first thing an executive does is say, well, let's go put AI on it. It's going to solve all our problems. And, you know, being really clear that you need good data to do data science, and you need to make that investment almost first, but again, like with the consultation of data scientists, because you want to make sure they understand where they can go with it, and you want that feedback into it. So I'd really love to be able to see us recognize and think about the proper data collection storage and accessibility in order to really unlock what we can do.

Adel Nehme:
Completely agree with your point and a lot of ways, AI this year is having sort of a Covid moment. If I have to kind of make that comparison. You know, when Covid started happening at the beginning. There was a lot of reluctance to admit that there's a global pandemic right, And then when people realize, Oh, there is a global pandemic, we're gonna lock down the entire planet right. There is this mass realization that things have changed. And I think with AI today there is a realization that oh things have changed and there's a lot of exciting things that are going to happen right at the same time. I think the messaging problem is that this is AGI, that this is some form of generalized intelligence and it's not necessarily so. It's going to be interesting to see how a lot of the you know lot of the public is going to react to these types of tools in the future, especially as they, as the robustnest conversation is going to become much more interesting and much more full fledged in the future.

Derek Cedillo:
Yeah, I think you're right. Maybe I shouldn't mention, but I listened to other podcasts on data science as well. But the Harvard Data Science Review does one. And I was listening to that not too long ago, and I forget the guest's name because I just walked in my dog while doing whatever. But they talked about an AI winter, right? And it's kind of exactly what you're talking about. Sometimes the expectations get so high, right? You end up disappointed in the end because you don't realize what's happened. And I know we've seen that within our own industry and company where, again, everyone thinks it'll be magically fixed and they realize, like, oh, you have to go invest in data infrastructures and things of that nature. And the reality kind of sets in. So those high expectations, and it goes back to that transparency point, can come with, you know, great, sorry, I can't think of the word, great disappointment. Expectations that can lead them to great disappointment if you're not transparent up front and what they can actually achieve in the level of effort required that goes into it.

Adel Nehme:
Definitely agreed. We'll see how the expectations will temper off and will change as the now releases of new large language models become more integrated in products that we see every day. Derek, finally, before we wrap up today's episode, do you have any final call to action for the audience?

Derek Cedillo:
Yeah, and I think I've hit the word three or four times in this podcast with transparency. And part of that really leads to an ethics discussion. I think it's really an uncharted subject in how we teach it and how we apply it on a day-to-day basis. I think it's all over popular media, right? From the terminator to the matrix and all those kind of shows, there's all these horror stories of how AI and machine But we should have more open conversations about it on a day-to-day basis. And not that I support the horror stories of it, but like I just said, about not letting your leadership have higher expectations than what you can deliver. So I think I'd like to see more of that in some of the teachings. You have business law ethics classes in the universities and things of that nature. I'd like to see us move more towards like, hey, what are the right responsibilities to do with AI? 

Adel Nehme:
And you know, as you as you mentioned here, Sometimes it's much easier to do a logistic regression and do a seventy percent. It's much more useful. Not easier to do a logistic gression that has like a seventy eighty percent accuracy than to try to do some deep learning model that is barely interpretable that no one can understand and no one has a lot of knowledge in the playing

Derek Cedillo:
That's right, so getting someone to understand a useful model, even if it's not perfect, is certainly one thing that people should focus on as well.

Adel Nehme:
Right, Thank you so much, Derrick for coming on the show.

Derek Cedillo:
All right, thank you very much. I appreciate the opportunity. 

Topics
Related

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

How to Learn Machine Learning in 2024

Discover how to learn machine learning in 2024, including the key skills and technologies you’ll need to master, as well as resources to help you get started.
Adel Nehme's photo

Adel Nehme

15 min

A Beginner's Guide to CI/CD for Machine Learning

Learn how to automate model training, evaluation, versioning, and deployment using GitHub Actions with the easiest MLOps guide available online.
Abid Ali Awan's photo

Abid Ali Awan

15 min

OpenCV Tutorial: Unlock the Power of Visual Data Processing

This article provides a comprehensive guide on utilizing the OpenCV library for image and video processing within a Python environment. We dive into the wide range of image processing functionalities OpenCV offers, from basic techniques to more advanced applications.
Richmond Alake's photo

Richmond Alake

13 min

An Introduction to the Mamba LLM Architecture: A New Paradigm in Machine Learning

Discover the power of Mamba LLM, a transformative architecture from leading universities, redefining sequence processing in AI.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

A Beginner's Guide to Azure Machine Learning

Explore Azure Machine Learning in our beginner's guide to setting up, deploying models, and leveraging AutoML & ML Studio in the Azure ecosystem.
Moez Ali's photo

Moez Ali

11 min

See MoreSee More