Official Blog
dataframed

Data Science at McKinsey (Transcript)

Hugo speaks with Taras Gorishnyy, a Senior Analytics Manager at McKinsey about what it takes to change organizations through data science.

Here is a link to the podcast.

Introducing Taras Gorishnyy

Hugo: Hi there Taras, and welcome to DataFramed.

Taras: Hi Hugo. I'm really glad to be here.

What do your colleagues think you do?

Hugo: It's great to have you on the show. I'm really excited to be talking about management consulting, analytics for businesses, and data science. So you're a Senior Analytics Manager at McKinsey and I want to know what that looks like on the ground. But before we get there, I'd like to know what your colleagues think that you do.

Taras: My colleagues, especially that don't do analytics themselves, don't always have a very clear view. Most of them think I spent most of my time building models. Some of my analytics colleagues think that I spend most of my time talking to clients but the truth is a lot broader than that.

Hugo: And I'm excited to get to that. I'm wondering, for the non-technical people, your colleagues, what does building models mean to them? What do they think building models looks like?

Taras: It's one of those sometimes scary, sometimes black box type of process where they see it as a whole bunch of random data from different client systems coming in and the outcome comes out that hopefully tells them some insights into what drives performance of the clients or what they should advise clients to do but for most non-technical people, most but not all, this process is not very, very transparent.

Hugo: Yeah, and I suppose they think there's some sort of computation involved; some sort of mathematics involved but they don't have a strong sense of what that may mean.

Taras: Exactly. But there are some people that do a lot of analytic work, even if they are not data scientist themselves, and for them they actually know exactly what's going on, they just don't write the code. It's actually impressive how well they understand the underlying data science.

Hugo: And I think McKinsey does an incredible job in a lot of work in thinking about how to explain technical model building, data science, analytic techniques, to a broader audience. I saw recently there's a great interactive, web-based explanation you have called an Executive's Guide to AI, which really explains the nuts and bolts to what type of models are considered artificial intelligence these days.

Taras: That's exactly right. We actually worked on that: one expert from our team really drove development of most of the content for the guide. We get this question asked all the time and we cannot use technical language to executives because they need to understand it, and the same time we really want to be factually accurate. Things like that, even explaining what machine learning, what deep learning is in simple terms is, is very very important for us. I think it's actually very important more generally, so people can make decisions, actually understand what they're trying to accomplish and they feel comfortable acting on it.

Hugo: Yeah, I like that, and I like the fact that a subtle overtone of what you just said then is that the machines don't necessarily make decisions yet. We do have certain types of learning algorithms, reinforcement learning in particular, but the fact that when we're looking at predictive analytics, we've got AI making predictions but then it's up to human responsibility to then make decisions and take action.

Taras: It's a great point. I think it's one of the key things to understand how different is model prediction from the actual decision. Model prediction might say that a customer is likely to buy certain product if it's offered at a certain discount or a customer is likely to leave the company because of interruptions in services. But that has nothing to do with the business decision that the senior executive needs to make, which usually is what type of discounts to offer to any particular customer, because there's a lot more going on. There's a lot more strategic, as well as operational practices.

Taras: Sometimes people think predicting the outcome, through the model, is the same as acting on it. It's just the beginning. There's a really, really long chain of thinking, usually qualitative thinking, that need to happen before you act on a decision. Not always. In some cases decisions need to be made in real time. For example, banking fraud or credit card fraud, and then algorithms actually act but for management consulting, when you talk about strategy and really high impact decisions, models need to be interpreted. We like to use the word translated, and the implications need to be assessed and pressure-tested before anyone can make any decision because the stakes are just too high.

What do you actually do?

Hugo: I love it and I look forward to returning to these ideas when we talk about specific verticals and industries where you see the most demand for data science analytics and management consulting. Let's get back to you before we get there. You told us that you colleagues generally will think that you build models to talk to clients but you also told us that what you do is a lot more broad then that. So what do you actually do?

Taras: I spend time in 4 to 5 different areas. As somebody who built and is leading a fairly large data science team, probably the most important part of my work is to make sure that we have the right capabilities that we can serve our clients on what their needs are. That involves quite a lot of strategic thinking in terms of what problems data science will be solving tomorrow or next year, and working backwards from there: what people you need to have on a team, what technologies you need to test out, what use cases and what clients you want to work with. There's a little bit of that.

Taras: A second piece is extremely important for anybody, which is the essential piece around relationship management. For us, that means working with clients, managing strategic relationships so we can actually deploy data science tools and derive real change. Those are internal. As a data science function, you need to work with technology function, with data function, with business development function, with marketing function, with individual businesses because data science is by nature is so embedded into the rest of the organization that you cannot be effective unless you spend significant time, 30-50% of your time just managing the relationships.

Taras: It's still a people world so you need to help people be successful and that starts at just the first interview that you interview somebody and hopefully if they get hired, then it's onboarding, then it's finding the right projects, then it's doing mentorship on the project, handling conflicts, helping people grow in what role they are in - how experienced they are - because consulting is people business and the only way to create impact is by mentoring, developing people, creating opportunities for them.

Taras: And then of course there is the actual data science part which I really, really enjoy doing. For me that's usually involves working with data scientist from the team, helping them structure their approach, and then helping them, if things don't work, to understand why so they are able to build models and deliver the outcomes. I still sometimes try to be hands-on data scientist write code and build models myself but it's really, really hard to find time.

Hugo: I'm sure.

Taras: Last piece, is of course, whenever I serve clients, it's making sure they can digest the models we are building for them. They know what the models mean. They can act on them, they can implement them, they actually most importantly measure the outcomes post implementation. They can see this really works and it's actual, tangible outcomes benefits either financial or other benefits to their organizations because that's the heart of what we do as consultants in data science.

Bi-directional Process

Hugo: So really that last point speaks to something very essential to the job that you do which is taking the results of the data analytic and data science process, and as we said, turning them into decisions and actionables. But even before that I suppose, you need to translate the business problem into a data science question, solve it to a certain extent and then translate it back to an actionable. You have this two bi-directional process, in and out of data science, right?

Taras: That's absolutely right. There is always a process of ... even understanding the business problem, and then translating that into an analytics approach and in the consulting settings, that's actually not that straightforward because at the beginning when we just start the project, we don't always know everything we would like to know about the client business and especially about their data environment and how they make decisions. The process becomes very iterative. Based on what we know, we structure an analytics process, figure out which models need to be built, at what level of granularity, with what data, and then we try that and while we're trying that, we see what works and what doesn't. We show it to clients. We get their input and based on that we refine the analytics approach until we actually get to something that's as meaningful and impactful as possible. So that translation happens at the beginning but it continues throughout the project.

Hugo: Does the question arise “is data science or data analytics even the appropriate approach to the business question posed?”?

Taras: Absolutely and I think it's the right question to ask. If you can solve the client question with a simpler approach, there is no need for data science or advanced analytics. I think advanced analytics is a great tool but we shouldn't try to force it to be used everywhere and if a simpler approach works or there's not enough data, then some high level strategic qualitative thinking or some interview-based approach or some case study based approach works much better. On the other hand there are real cases where you have to use fairly advanced modeling to get to an understanding of the environment or market or business that's deep enough and that's accurate enough. That's the case for data science.

Taras: And by the way, what was really interesting is that it happens both ways. Sometimes you start an advanced analytics project that you think you build a lot of models, and as you learn client business or data availability more, you realize that's just not going to work, so you go back to basic qualitative analysis. Sometimes you start this strategy project that has none of these components, and in the middle you realize that there's so much to be learned from applying advanced machine learning that you can re-scope the project and bring that in. And that's why it's so important to have people involved in the project who actually understand data science as well as understand the business question; and then they can make these decisions on what's the best approach to go forward.

What should data science teams look like?

Hugo: This actually speaks to something else, that different people on your team have different skills, and you mentioned that you need a fairly large team of data scientists, and my question there is, have we as a community figured out what data science teams should look like? And what I mean by that is with backend engineers, we know what those teams generally look like and how they work together and best practices. Have we figured this out in data science yet?

Taras: It's an excellent question. I don't think we have and I believe it's not just McKinsey or consulting. I think it's in general. It's evolving very, very quickly. Our own journey was that in the beginning we had more people who were predictive modelers and who had very broad backgrounds. They could build a wide range of predictive models. Increasingly what I see is that on one hand you need to start specializing much deeper so you need to start having people that only do NLP or only do deep learning or only do anomaly detection.

Taras: On the other hand, analytics is found beyond predictive modeling. For many questions that we need to answer now, somebody understand complexity theory and can do real advanced simulations is actually invaluable. I only see this process of changing the needs for data science skills accelerating with every year. For us, not only have we not figured out the steady state but I don't think there is a steady state.


Taras: On the other hand, analytics is found beyond predictive modeling. For many questions that we need to answer now, somebody understand complexity theory and can do real advanced simulations is actually invaluable. I only see this process of changing the needs for data science skills accelerating with every year. For us, not only have we not figured out the steady state but I don't think there is a steady state.

Taras: Maybe for some organizations that have much more fixed business model that you only need certain types of skills sets, maybe that's more steady, but even then I don't believe, given how quickly data science changes, you can be static about it. You need to constantly be adding skill sets and moving along with the field.

Hugo: And something you've spoken to there, is not only is the skill set changing so rapidly and the techniques, but even what happens on a daily basis. As we see more feature selection automated, more data manipulation, data munging, data cleaning, automated machine learning, for example, we're going to see what data scientists do on the ground evolving incredibly quickly.

Taras: That's absolutely right. I think it's amazing to have tools now that we didn't have two years ago. Even for computing infrastructure, you can go to Google Cloud or AWS and spin the system of clusters with all the software that you need in one click of a button, which was completely impossible before. That changes the speed with which we work and the nature of the models we can build and how easy it is, so much that suddenly it opens up new possibilities. There's only strong influence of technology on what skill sets data scientists need to have as well.

Data Science Demand

Hugo: Fantastic. I want to jump in now and find out through the lens of your work at McKinsey, which verticals and/or industries do you see the most demand for data science, data analytics, management consulting, at the moment?

Taras: If I look at the moment, I think about it in 3 broad buckets of industries. The first one is the one that really interacts with consumers directly. Industries like retail, industries like telecom, some media, some banking, and for them the need to set up - it's actually really driven by personalization - the need to heavily customize their products, their marketing messages to individual consumers ideally and do it in real time with as much data and analytics as possible is really key.

Taras: The other bucket is essentially organizations for which risk management is a big deal. Insurance for example. Insurance is all about price and risk properly and managing claim processing very efficiently. Setting up the rates very efficiently and for them you can't do risk management without quantitative methods. That is another really, really interesting application of analytics.

Taras: And then finally many other organizations generate a lot of data from their operations. For example, think about a semi-conductive fab that has literally hundreds of different pieces of equipment and each piece of equipment for each process step, making chips, generates real time data with millisecond precision from all the sensors. It's extremely complex. The data is very large and all real time and the stakes are really high to optimize manufacturing and do it well. I think for those organizations there is huge need for advanced analytics to be driving better decisions. Another piece similar is genomics in bioinformatics: huge amount of data coming in.

Taras: Another piece is health care where you have now structured data on health care claims but also unstructured data from medical testing, images, etc. This is a little bit more where data is there but analytic methods are not always there. The data has not been used yet, fully, and that's another area where there is increasing pool for data science.

Hugo: There is so much interesting stuff in there that I'm at a loss to figure out quite which direction to go. You stated industry verticals that address a lot of consumer interactions, those that need excellent risk management. You also mentioned industries where data is being generated. Semi-conductors are really interesting because I think a lot of people, when they think about data science, they think about data science in tech where we kind of know what we're doing a lot of the time. But when we've got real time data flowing straight in and we need to make decisions straight away or automate those decisions, that's a very different game isn't it?

Taras: Absolutely. We always talk about digital born companies and they do amazing stuff don't get me wrong, but if you're a completely digital company, it's actually easy because all your information is digital. You can access it and you can test your ideas really quickly. It's just natural. Try to actually do analytics if you're a semi-conductor player or if you're agriculture provider. Some of my most fascinating experienced were actually in old school field like agriculture but imagine that you have a tractor, driving in a farm. The tractor has a whole bunch of sensors. Now tractor has a video camera that can film the fields and you have a drone that flies on top and it takes real time data of the crop. All of this needs to be transferred, in a very low cost, from the farm to a central processing unit or all the computations have to be on premise, on tractor, as an edge system.


Taras: And you need to optimize an extremely complex set of real, physical equipment actions. It's actually really, really hard and you need to do it in such a way that the farmer would not mind using the advanced analytics to improve his yield of crops. I think very often we don't give enough credit for traditional industries that operate in the physical world because it's so much harder.. To me that is were a lot of value of analytics in the future will be coming from.

Hugo: That agriculture example is so fantastic because it speaks to another very interesting concern that ... lets say that you have a drone taking photograph and you want to do some image analysis, pattern recognition, object detection, bounding boxes, whatever it may be. You might say I can just throw a huge convolutional neural net at it. Having said that, if you want to do this in real time in the drone, sure if you’ve got clusters in the cloud, you can do this, but if you want to do this in real time in the drone, it will actually change what type of model you build to do it, right?

Taras: Exactly because literally in that time, you start thinking about time it takes to classify single image and you actually literally start counting how many milliseconds your convolutional network can run. Different between 100 milliseconds and 300 milliseconds means certain speed of flying for the drone and you just cannot go sometimes with the 300 because otherwise to take image of 100 acre fields will take you forever. You go with simpler models that can work in the ag computing environment that maybe not as accurate as possible or you spend a lot of time thinking through your deploying architecture, how to keep accuracy up but reduce computing time down dramatically.

Taras: That's another fascinating problem that's very different from what conventional data scientists will have to deal with.

What organizations have you worked with?

Hugo: Right. Now I want to jump in and find out about organizations that you've worked with or industries. I do understand that there are certain ... there's a lot of privacy you need to respect with respect to your client base but I was wondering if you'll tell me about a few key examples of organizations and/or industries that you've worked with.

Taras: Yeah, absolutely. Let me switch gears and talk again about another real low tech line. I work with a correctional facility and the problem that they had was problem of violence. The inmates were sometimes really violent and they were seeing increases in violence and the administration could not understand why. It's very challenging environment. It's always high pressure and the stakes are really high. Literally people might get injured or killed. On the other hand, you have ability to act and a lot of data you can use to make better decisions.

Taras: What we've done is we looked at what actually drives violence in each group. It's not necessarily about individuals. It's about if you put certain number of individual in the same cell, it's the relative composition of people that go in there that increases or decreases the violence. If you start thinking about it from the quantitative perspective and understand what drives that, you can actually reduce violence quite dramatically.

Taras: It's a great example where looking at industry that traditionally does not use analytics leads to a huge impact and it really makes a difference in a major way.

What does it take to change an organizations decision making process?

Hugo: So that sounds like a success story. I do think when an organization has challenges or problems where data analytics can help to solve, there is some sort of barrier. The value needs to be created and it needs to be demonstrated. I'm wondering what does it take to change an organization in terms of their decision making process, to change them through data science?

Taras: That's a great question. It's a very challenging task because many things need to happen and all of them need to happen. If one link is missing, chances are it's not going to work. First of all you need to have a vision of what analytics should do for an organization. Why do you actually want to have data science or analytics in the organization and people need to believe it will add value, and it is connected to the business strategy. Not just analytic teams but throughout the organization.


Taras: Secondly, you need to have support from very senior executives, ideally C-level, to actually create that excitement in the organization, to make sure we have the right visibility and funding and right resources to do it; and to actually help analytics or data science team work with other parts of the business. Because with all of that, very often data scientists just build models in isolation and models never get used. Then you need to have the right data and data environment, and that is relatively a slow process that can be expensive. You always need to start building that environment but you shouldn't wait until that is done before you start deploying analytics.

Taras: You want to identify areas where you can drive real measurable results very quickly so that there is excitement in the organization about the analytics and more and more people want to try. That's what we mean by use cases. You find out where there is the biggest value for analytics, and you go, and you build the models. You start making decisions different. You capture that value for 3, 4, 5 use cases quickly and then everybody wants to do it for their own part of business.


Taras: And then you need to make sure it actually sticks; it's not something that you've done once. People use it for three months and then they went back to the old ways. To make something stick, you need to redesign the processes of making decisions that usually involve some kind of technology solution - software or interfaces - to make analytics digestible and it involves retraining people. It involves new ways to measure the outcomes of the decisions.


Taras: And finally, organizations that are really great at analytics, they always do that. You need to change the culture of the organization so that every time you're on a business meeting and you want to propose something, people will ask what data do we have to act on this proposal. Is it really backed up by hard numbers, well-designed models or not? And once you get to that level, then analytics truly becomes part of the company DNA and you accomplish what you're trying to do but that is multi-year journey for most organizations, and it's not easy.

Hugo: Right. I want to zoom in on this idea of early stage value extraction, in particular through several use cases. I suppose this essentially is having a few proofs of concepts, demonstrating their value, and gaining trust of people at different levels throughout the organization.

Taras: Yes. That's fair. Think about it from the perspective of a business executive who doesn't necessarily understand technology. It's a buzz, everybody talks about it but you want to know if this thing is real or not. What real means is if I start doing what the model tells me to do, is my business doing better or worse? You need to convince somebody to give you a shot, which usually happens through the proof of concept, and then you need to very rigorously measure what happened so that it's so clear, that there was impact and the impact is directly attributable to analytics. That use case does two things for you: A, the business partner that you work with becomes a champion for you and uses more analytics within their own part of business; but B, creates a much broader visibility that other people see it and other people say, "You know what, this thing actually really, really works and I'm just going to go and try to do it."


Taras: Then finally, what's important is if you can show that you spend $2 million on a use case but it generated $50 million in revenue or in cost savings, suddenly you can claim that analytics becomes self-funded so organizations can allocate a bigger budget to do another 3 or 4 use cases and to fund new technology or new software or new data management systems and that's how you get going in real organizations because you always need to justify the budget that you spent on certain activities.

Hugo: Right. So you mentioned if one of these moving parts is missing, the change in the organization will very likely fail. There's a lot of these moving parts. You mentioned creating the vision for analytics, having strong support particularly at C-level, early value extraction, process redesign, culture change, data foundation. It seems like a lot to get working together. How often are you able to do that? How often are you able to see it all work together?

Taras: It's a great question. Another question to ask, another way to ask this is, how many organizations have actually been able to achieve it? There is not many. In each industry there is maybe 2, 3, 4 organizations that are ahead of the pack and sometimes none of them actually achieve this wide acceptance of analytics in every decision making that we talk about. It's actually tough but what we're seeing is that where clients are now, it's very different from where they were 2 years ago. Everybody is moving in the right direction and most companies are making significant progress. All of them are not there yet and I think this journey will take a little bit longer for them.


Taras: There's too many things that need to happen and each of these steps is quite a long step.

How has data science evolved?

Hugo: So Taras, you've been leveraging data science to help Fortune 500 companies, for example, improve their performance for over a decade now. How have the different moving parts of data science evolved over this time?

Taras: It's fascinating. I would say that there's so many changes that happen in so many ways. The simplest one to see is software. Ten years ago we were using mostly SAS, and then open source became really prominent and people moved to R, and then to Python and there is just continuous emergence of new software tools and that's very easy to see and track.

Taras: Second piece I think is for algorithms themselves. Ten years ago we were still doing a lot of BI, and we were a doing a little bit of statistical modeling, but it was mostly linear models. It was classical statistics but then machine learning became much more prominent, and then every quickly we started being able to work with unstructured data through deep learning, through NLP, and to me that's the next wave that's happening now, moving to the unstructured information.


Taras: Another change that I noticed is in what domains analytics is used. Ten years ago market analytics was big. Risk analytics was big in financial sector because financial sector really needed that. And then some of the heavily operational companies would use supply chains in inventory management, but those are three big areas. Now if you look at where analytics is, it's literally everywhere. It's in HR organization function. It's understanding what people to hire and when they are likely to leave.

Taras: It's every interaction we have with customers, it’s customization. It's just so broad and that's relatively new. This spread of analytics into every decision making is something that happened in the last 10 years and to me that's probably the most fascinating part.

Hugo: I think the HR example is really interesting and we’re seeing increases in machines helping out with the hiring process, for example, of course machines can encode human biases, machines can create their own type of biases as well. I'm wondering what your view on these types of biases that may occur, what the major challenges are, and how important it is to have a human in the loop as much as possible.

Taras: That's another great question. I will tell you my own perspective. I do think there's many different points of views on this. I think that human plus machine works a lot better than just machine alone. Some of my colleagues that are data scientist in tech firms actually disagree and they see their role as getting human out of the loop and fully automating everything. I think there's a difference in the nature of the problems we are trying to solve with analytics. But for me, on the one hand, it's really good to design algorithms with as little bias as possible, and bias usually just comes from your input data.

Taras: You need to think about is your data is actually representative or is it biased in any particular way. But on the other hand, once you get your model predictions, what do you do with it? We still come back to the issue of translation and to act properly on output algorithms, that's where humans really add value. If I go to the HR example, for example, I spend a lot of time looking at McKinsey's recruiting and understanding what performance characteristics people display during interviews and predict their long-term success but it's not like I take output of predictive model and allow algorithm decide who is going to get hired or not.

Taras: It's just for us to inform our recruiting process to focus on the relevant things but humans still need to process, digest, assess how precise the predictions are and assess false positives and true positives and based on that, design the right recruiting process. You can't just take human out and let algorithm design. It will not work very well for much qualitative, strategic decision you need to make.

Hugo: You actually said something very interesting, well a lot of very interesting things in there, but something I want to zoom in on is you said a model may optimize long term success, and we might not even know how to define long-term success correctly. That may be something that evolves over time. Trying to figure out what we're actually optimizing for or what we're learning is just as important as implementing the algorithm.

Taras: Absolutely. The performance metrics matter tremendously and sometimes we see if you change one metric to the other, suddenly the models change a lot. That's also why it's so important to keep an open mind and have a human involved because sometimes we build models for different outcomes. The outcome may be, if I take a slightly more traditional commercial example, outcome might be revenue growth. It might be profitability. It might be market share, and you realize if you build 3 different models with 3 different outcomes, you get 3 different set of drivers and if you start analyzing the differences between them, you really get a much deeper understanding of your market dynamics because you look at multiple dimension of the problem. That's absolutely critical.

How do you see data science evolving?

Hugo: So you got some really nice insight into how you're seeing the different moving parts of data science evolve over the past decade and more. How do you see the different parts of data science evolving in the future?

Taras: It's fascinating. For one thing, I think open source is here to stay. I just can't imagine doing analytics without open source, given how complex it gets and how increasingly powerful it becomes. We all have to share the code, share algorithms, and it will remain relevant for a very long time. Taras: Secondly, I think unstructured data will become more and more important. It's already most information that's been generated has been unstructured, but we are just at the beginning to use it and I think the advances that can come with improving, for example, NLP techniques, will be truly transformative and it's actually beginning very, very real. Even the work we do now in NLP, it's so different now than what you could do 2 years ago, that if i just fast forward the pace of development I know within the next 5 years we'll have a very different face of analytics just because of that.


Taras: To me it's also a lot about interpretability. Take deep learning for example. Amazing techniques. It allows you to do things that are not possible completely. Analyzing videos, analyzing images that are not possible to do well before deep learning. At the same time, the applicability of deep learning, in truly strategic decision making, is not that high primarily because of that. But I spent a lot of time looking at symbolic equations. Something that can automatically discover features just as deep learning does but it presents those features as symbolic equations.

Taras: That becomes extremely powerful because then you understand what's going on and what drives your outcomes and you can actually explain it to the executives. To me I think there'll be a lot of advances made in our representation of models and in making models more transparent, while using complex models. Simple models we have done. It is really now about complex models but making them understandable. To me that will be another big change that will hopefully happen in another 5 years.

How do you see data science use evolving within organizations?

Hugo: Great, having seen how the different parts of data science will evolve in the future, in your mind, how do you see the use of data science within organizations evolving in the coming years?

Taras: It's another fascinating question. I think that data science will be democratized. Part of it is that isolated data teams that we have now in many organizations will become more much embedded within the rest of the organizations. Literally data science will have connections to every business, every function, and as a result will be impacting a lot more decisions. Another piece is that I think data science will be used by some of the non data scientists, just like when Excel and Spreadsheets came around, suddenly business analysts were able to use it and you didn't need to be a programmer to do it.


Taras: With good graphical interfaces, with algorithms that are much more intelligent, you can actually have people who are business managers or business analysts to use data science tools. I think we'll see a lot more of that in just core data, so data science as well. There'll be hard problems and real data scientists will be solving them and there'll be more standard problem for which we will have really good tools and lots of other people will be able to use them.

GUI

Hugo: This is somewhat of a controversial question in a lot of circles but my question for you is can now or is there a future in which data science can be done in a GUI?

Taras: We’re all data scientists, and the reason we don't like GUI is because it's an inefficient way to work but at the same time, if all you do is multivariate regression, and all you need is look at your p-values and look at your regression coefficients, why would you as a data scientist want to do it yourself - writing code in Python? Just give the GUI to business analysts, let them upload it, build in the right checks and balances, and to make sure right significances are done, and help business analysts take intro statistics courses and you're done. I think DataCamp, for example, has done a lot democratizing that knowledge because I know online learning data scientists are so powerful and I think it's not just data scientists that take the courses. At some level we as a data scientist don't want to let it go and sometimes we just don't trust other people to do it right but for us to really continue to have impact, we need to work on really hard problems with continuously new methods and approaches and that means stop working on simpler problems that are very easily solvable that somebody else can do. For me I just see it as a natural progression.

What is your favorite technique or methodology in data science?

Hugo: I love it. I'd like to know one of your favorite data science techniques or methodologies is.

Taras: I am huge fan of autonomous feature or hypothesis generation. Again to me the ability to find features that drive outcomes, but yet unlike deep learning, ability to express those features through symbolic equations, is hugely powerful, at least in a strategic decision making where we need to have more understanding than precision. I have seen this space evolving very, very rapidly in the last 2-3 years from pretty much zero from before that, to actually now having companies and products that can do it quite well and I think we're just scratching the surface of that. For me the interface of powerful future engineering and interpretability is definitely my favorite area of data science now.


Taras: Another one is, which is my second favorite, is the science of complexity. We talk about building models but if you start thinking about simulating behavior of complex and non-linear dynamic systems that sometimes behave in chaotic ways, they're still guided by rules that can be learned and can be developed. It's fascinating and its impact is huge because our life is essential a complex system and every organization is a complex system. Human bodies are complex systems. Most engineering processes are complex systems so the ability to do it well, with precise mathematics, is absolutely fascinating. And again, I'm beginning to see some data scientists and some software tools being developed that can handle complexity without excessive amount of effort, I would say. To me that's another emerging trend that I think we'll be talking more about 4 or 5 years from now.

Hugo: I really like that the favorite techniques and methodology you mention are very forward looking because the automatic feature hypothesis generation is something which is now and in the future will change what all data scientists do on the daily basis. It will automate a lot of the drudgery, the 80%, so we can focus on the far more exciting and creative work.

Taras: Absolutely. Absolutely. I have no doubt on that because it's already happening and we will just continue to do this in more classes of problems and with more impact.

Call to Action

Hugo: So Taras, to wrap up, do you have a final call to action for our listeners out there?

Taras: My final call is really things are moving so fast now and all of us as data scientists, if we're doing now what we were doing a year ago, we're really falling behind. My call for action, and that's something that I try to do everyday, continue to learn new things everyday. Keep in touch with new software development, keep in touch with new algorithms, understand mathematics and just deploy it because to me that's the biggest fun part about being a data scientist is this learning and the application to new domains, new areas all the time.


Taras: It's very, very nice that we actually have to do that ourselves to stay relevant. Don't do the same thing over and over. Keep looking out for new opportunities.

Hugo: I couldn't agree more and I think the learning new things, especially as you said in a field that's moving so quickly. You also mentioned to learn some of the math as well, which is incredibly important, and to also not be so scared of math because math can be overwhelming in a lot of ways. When you're writing code, bit by bit, the basics of what you're doing, maybe try to re-implement a few of your favorite algorithms, whatever they may be. In terms of learning new things, I may be biased but I think DataCamp is an incredible platform to do that on. I also think McKinsey has a lot of fantastic online resources. As I said, this interactive Executive's Guide to AI, you've actually got a whole bunch of stuff, commentary stuff, released on the economics of AI which I think would provide a wonderful counterpoint to people actually writing code as well. I think all of that is really fantastic and we'll include a link to a bunch of those resources in the show notes.

Taras: Perfect. Could not agree more.

Hugo: Absolutely. So Taras, thank you so much for joining me on the show. It's been absolute pleasure.

Taras: Yeah. Pleasure is mine. Thanks so much for inviting.

Want to leave a comment?