The Credibility Crisis in Data Science
Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Skipper Seabold, a Director of Data Science at Civis Analytics.
Hugo: Hi there, Skipper, and welcome to Data Framed.
Skipper: Thanks. Happy to be here.
Hugo: Great to have you on the show. I'm really excited to talk about all the things you've been thinking about with respect to the current, looming, future credibility crisis in data science, how we can think about what the science in data science actually is and how to put it in there in a more robust fashion. But before we get into this, I want to find out a bit about you. What are you known for in the data community?
Skipper: Sure. I started the statsmodels project. Trying to make it so that people could do econometrics and statistics in Python. This was back in 2008, 2009 or so, kind of, before pandas, before Scikit-learn, I got bit by the Python bug and really wanted to be using Python to do the work that I was doing in graduate school. I was in grad school for economics so I picked up this bit of code that was part of SciPy, and took it out of SciPy, joined the Google summer of code, this program that Google runs for doing open source software and then kind of ran with it from there.
Hugo: Cool and so this is 2009, you say?
Skipper: Yeah. About that.
Hugo: So this is when the SciPy community was really getting some steam as well right? The early days but there was a lot of work being done there.
Skipper: It was starting to pick up. I think my first SciPy conference, a scientific Python conference, I'm fairly certain I was the only social scientist there if not only one of a handful. I met Wes McKinney at the one after that and that's when Pandas was starting to get off the ground and we started talking to each other about what we're up to and how we can make our projects compliment each other.
Hugo: Yeah and Matplotlib was being developed around the same time, right?
Skipper: Yeah, Matplotlib was the workhorse at that point for plotting. SciPy was relatively new. NumPy was still kind of relatively new. Pandas definitely did not exist. Scikit-learn did not yet exist.
Hugo: What about Ipython, was it -- had Fernando started...
Skipper: Ipython yeah was around. Definitely using Ipython almost from the start.
Skipper: Jupyter notebooks were definitely not a thing. There were no notebooks.
Hugo: Yeah. And I was gonna say for our listeners Ipython is an integral part of project Jupyter, which you may know for Jupiter notebooks and Jupiter lab and all of these things.
Hugo: And you're also involved in SciPy which is related to the Py data community as well, right?
Skipper: Yeah. I remember the first couple of Py data conferences. Spoke at a few of those. Just trying to get the community off the ground and focused on how we can use Python to do scientific computing, not only just in traditional backgrounds like engineering, but also in stats and kind of data science.
Hugo: For sure. That's pretty cool. I mean, the Py data conferences are all over the world now and I actually went to one in Berlin a couple of years ago, which was incredible. I've been to several in the US and it's actually a talk you gave at Py Data LA that kind of inspired the conversation we're going to have today as well.
Skipper: Yeah. I think that was a good opportunity for me to get the things that were on my mind out into the world, force my thinking down into it ... into a little bit of a box and then get out there and start talking about it.
Hugo: For sure. So maybe you can also tell us a bit about, you work at Civis Analytics so maybe you can tell us a bit about what Civis does and what you do there.
Skipper: I work at Civis. Our background, we got our start out with politics. So our CEO was the Chief of Analytics for the Obama campaign. After the campaign he was talking to Eric Schmidt and they were talking about how great it was what Dan had put together in terms of people, processes and technologies for campaigns and the hypothesis was that businesses are craving for the same thing. That was kind of the genesis of the company. Since then we've branched out from politics and so we are a data science technology and services company. But we work kind of across all industries from CPG to health care, media, brands, still doing some work in politics. Working with people who want to do consumer research, these kind of things.
Hugo: Cool. And what do you do there?
Skipper: I run our data science R&D team. Also, I'm a product lead. I spend a lot more of my time these days thinking about product and a thing I'm working on called identity resolution. The interesting data science part to this is it's basically entity resolution at scale. So how do we know whether two people in different data systems are the same person, probabilistically.
Hugo: Okay, interesting. It sounds like you're wearing multiple hats at Civis.
Skipper: I think that's fair to say. Often data scientists are wearing multiple hats especially as we kind of progress in our careers and become more embedded in the business, which is something I kind of hope to talk a little bit about today.
Hugo: I look forward to it. I'm imagining some sort of stacked hat chart actually. So you mentioned that you were bitten by the Python bug early on, maybe you can just tell us a bit more about that. Because people are... I got bitten by the bug when I could import CSV's using Pandas into Jupiter notebooks. This is before notebooks and before pd.read_csv.
Skipper: Yeah, that's right, yeah. This is something I like to tell people a lot. My background, my major up until my last year of undergrad was poetry writing. So I could not have been further from math and data science and programming. I switched over to studying economics. Ended up going to grad school for econ. Really liked econometrics, anyway, we got into our math econ kind of boot camp and first math econ class and it was taught in Python. It was kind of the luck of fate because the year before and all the years before that had been taught in this language called Gauss. But the professor decided to move over to Python using NumPy.
Hugo: Was that a pretty forward thinking move on the professor's part at that point in time in an econometrics class?
Skipper: Absolutely, absolutely. I had a few kind of early mentors/advisors who told me that Python was a complete dark horse and if I wanted to be doing this I should be doing R programming. I tried R as well but working with Python was just great. It really helped solidify the stuff I was working on. I felt like I didn't really understand it until I could code it. And once I could code it I was like, " Oh sure I totally understand this and now can program and speak confidently about it I guess."
Hugo: That's cool but then there's the next step, I suppose, realizing you had questions in economics and elsewhere that you wanted to solve and the packages didn't exist to solve them.
Hugo: So then you figured out somehow how to develop packages and Stats Models was one of the results.
Skipper: Yeah, for sure.
Hugo: What does that look like going from being a proficient programmer in Python to start developing packages yourself?
Skipper: I think it went the other way.
Skipper: I jumped in, and the thing that was really helpful was there was a problem out there that I needed to solve. I needed to be able to run an OLS regression, get the results out of the regression in a way that was easily understandable, build a user interface. I didn't know the first thing about programming, but I had some good and patient mentors and kind of co-developers along the way. I learned a ton from the open source community. Just by kind of, honestly, failing in public, flailing in public, making poor requests and having people take the time out of their day to spend hours reading your code and suggest how to be a better programmer. I think being a better programmer definitely came second.
Hugo: Interesting, and that was really my next question around community, which is, the SciPy community was really burgeoning at that point and what was the role of that community in helping you to develop what you developed?
Skipper: They couldn't have been more helpful. This is before stack overflow, so I just signed up to all of the mailing lists and I just read everything that everyone wrote. I came to figure out who the people were who were just very forthcoming, forthright with their time. Here's what you should try, maybe you should think about this, and I just saw a group of researchers and scientists discussing ideas and doing research engineering, we might call it now. It's one of the things that drew me to Python versus the R community at the time. This has definitely changed in the R community, but jumping on the R help mailing lists kind of back in the day was just a trial by fire if you didn't know what to say and what you were doing yet. I found the Python community to be a lot more welcoming and helpful for me.
What is the credibility crisis in data science?
Hugo: Cool. So we're here today to discuss the credibility crisis in data science. What is this crisis to you? I'm wondering if you can speak to concrete examples.
Skipper: Sure. It's a little bit of an analogy really. The analogy is with this famous paper in econometrics and in economics. It's a paper by Angrist & Pischke, which is a response to a paper from Ed Lemur in the 80's. This paper in the 80's was basically saying " Hey, economics isn't having an impact. Policy makers aren't listening to us. We're over here arguing about minutia. Let's go ahead and take the con out of econometrics" , that was the provocative title for this paper.
Hugo: That's a great title.
Skipper: Twenty years later, there's this paper, the "Credibility Revolution: Empirical Economics, How better research design is taking the con out of econometrics". Their premise was that there had been this maturation in economics and now policy makers took economics seriously. I see a lot of analogs in what is happening in data science today, kind of going through this a little bit of a credibility crisis, and I see some of the solutions to be a little similar to what we saw in econ. In econ, some of the things that people were complaining about was there was just a lack of data in the studies, so as some of these studies that people were doing had twenty or thirty data points, and they're like yeah, it's kind of hard to make conclusions for this. A lot of the economists spent time arguing about just minutia of what estimator are they using, what functional form are you using, should we log this variable or not? And there wasn't also the theory to just tell people what works and what doesn't. So you can understand why policy makers are like, "y'all are having fun over there but you're not actually affecting what we're doing so we're not gonna pay attention to you."
Skipper: I see something similar in data science. There's a tendency to focus on minutia, like what neural network architecture are we using? Are we using R or are we using Python? What method are you using? These kind of things, which are not as important to decision makers. One thing that I've heard just in working here at Civis is like you say from a CEO of a very large company that every one would know, if I mentioned who they were. I mean it's basically saying if I have all these data scientists, I have hundreds of data scientists and I have no idea what the fuck they do all day.
Skipper: It's like a part of a profession, part of a class of jobs. That's not what you wanna be. You don't want the decision makers and the people who are supposed to be benefiting from your insights, not able to discern what it is you do not understanding what your output is.
Hugo: And that seems to be the result of many things but including a total miss, or crisis of communication right?
Skipper: Yeah. I think that's exactly right. I think decision makers don't know what data scientists are capable of. They don't know how to communicate with them what the business needs are. And I don't think data scientists necessarily know how to describe the value of what they are doing in terms that the business can understand. Or also even focus their efforts on things that will have an impact for the business, like going out and understanding what it is that will actually make a difference and then making sure you're doing that. I don't know. It's kind of a cliché now but you hear people go like, "well have you tried logistic regression?" Start simple first, provide value, and then go deep. Then go further and a lot of times people want to start the other way and I think that's part of why we're having trouble communicating.
Hugo: Sure. And of course we have a lot of companies across a lot of verticals and industries hiring for data scientists, getting data science consultants to tell them to use data, all these techniques to tell them things that can impact, give them insights from data that will impact decision making. How impactful do you think this actually is? I mean do you think a lot of companies take these insights and go, "Okay. We're gonna base our decisions on these insights now"?
Skipper: It's highly varied ... One, I think a lot of data scientists are way over promising. And two, I think the number of companies that are super forward thinking and then have data-centric thinking embedded in their decision making process are very, very few. There's maybe a handful that are really, really good at this. Maybe a dozen. And then there is a long tail of thousands of companies who are not, but are seeing this happen and want to be in on it. So there's definitely good in getting this right. But the vast majority are not.
Hugo: And do you think part of the challenge is that data scientists are sort of hailed as prophets or they have keys to the gates that no one else has?
Skipper: A little bit. I think it's also a little bit of a problem of marketing as the hype train is underway. Part of people's incentives are how can we ... it's a little cynical, but how can we monetize the hype train? So you see the use of terms like AI and cognitive and even data science itself to start. Like, okay what are the outcomes? What are the outcomes that we are gonna enable? Instead of, what are the fancy methods that we're gonna use? That kind of thing.
Hugo: Yeah. So the analogs there, I suppose, we've spoken to are the crisis in communication and expectations not necessarily being matched. Was there anything else that's analogous, do you think?
Skipper: I don't know. It's tough to say. A little bit just... What is it that we do as either an economists or what is that we do as data scientists? How do we provide value? What is our process for working? How do we start from a proposition and then go all the way to providing something that will be an input for someone to make decisions? I think that's the big analog.
Hugo: That's great. Cause that actually provides a really nicely segue into what I want to talk about next which is, in your talk, pydata LA talk, which was called "What's the science in data science?" We went into that in the show notes, you defined data science "as using multi-disciplinary methods to understand and have a measurable impact on a business or a product".
Hugo: So I kind of wanna tease apart what this means and what the steps involved are. So I suppose my question is, how does this process relate to the scientific method? How do we put the science in data science and what are the key steps involved in actually measuring the impact?
Skipper: Yeah. Okay. One of the things that's pretty common among data scientists, it's not ubiquitous, but one of the things that's pretty common is a background in the sciences, right. So everyone has been trained in the scientific method. What is the scientific method? You know, we wanna start with a relevant question, we wanna come up with a hypothesis about that question, something that we can falsify. Then we wanna design our research program, like how are we gonna design an experiment that's gonna allow us to know whether or not our hypothesis is false? We want to analyze the results of that and then finally we want to either communicate, or this is not exactly, or it might be part of the scientific method, maybe some applied science. How do we get this into our product? How do we get into decision makers hands or how do we make a product better with what we've discovered?
Skipper: It just doesn't look that different in working as a data scientist day to day than it would working as a scientist in a lab. The question may be a little different. You may not be asking, what is the nature of that universe. You may be asking what's the most relevant business question that's driving us nuts right now? Do we know how well our marketing campaigns are performing? Do we know why we are attriting employees? Why employees are quitting? Something like that. But from there on out, I don't think it looks terribly different from the scientific method, if you're doing it right.
Hugo: Yeah. And the one other difference I think is timescale. Of how long the project is.
Skipper: Yeah. That's fair. That's certainly fair.
Hugo: Three years to publication.
Skipper: That's certainly fair. Yeah. We're probably talking about months or years if you're lucky to work somewhere that allows you to think in the year scale.
Hugo: Great. I suppose then the first point as you were saying is to figure out why you're there as data science, right?
Skipper: Yeah. Go out and talk to as many people as you can and figure out what it is that you need to do to provide value to the company. I mean if you're somewhere that already kind of understands this, and it's just lacking the people who can execute on the data science then that's great. Quite often, the business don't even know how to ask this question right. They don't even know how to provide the inputs to the data scientist so the data scientist can be maximally productive. So I think it's a little bit on the data scientist to go and seek these things out. Asking good questions and understanding, "Hey, what's your job? Hey what's your job?" That kind of thing.
Hugo: And to make sure that the question is posted in from a product manager for example, is the question they're working on in their code and in their process?
Skipper: How do you mean?
Hugo: Well I mean that a product manager can pose a question and you go away and you're actually working on something quite different, not actually answering the question that they've asked, right?
Skipper: Yeah. That's definitely a possibility. I'm glad that you brought up product managers because I think, if you find a good product manager, a data scientist can be a perfect partner or even a good product manager eventually. But even product managers are ... they don't always know what's possible with data science so they don't always know what's the right question to ask. And then if they do know what's the right question to ask, it's entirely possible that the data scientist then goes off and it's like 'I'm gonna come up with the greatest neural network architecture and I'm going to maximize the accuracy of this model to the 100th decimal place and it's gonna be great." But what they really needed was an interpretive model or something like that. So sometimes the data scientist can lose the force for the trees and I wanna urge people not to do that.
Hugo: Yeah. Exactly. So this is all in, I suppose from your definition or from your statement an understanding of a business or a product. How do you measure the impact of data science?
Skipper: Yeah. Once you understand what the question is, I think measuring the impact, and this again cuts the heart of the scientific method, is teasing out everything that is not related to the thing you want to measure. You want to be able to isolate the effects of the thing you are trying to measure that answers the question. So to be concrete, how do our marketing campaigns perform? So you want to run a single campaign and then you want to test whether or not that campaign had, let's say, a better than break even performance. So we went out and spent $100,000. Did we as a business realize gains above $100,000 by doing this campaign? So you just wanna wash out everything else and make sure you focus on that and are actually measuring the return to the thing that you think you are measuring return on.
The Credibility Crisis Boils Down to Experimental Design
Hugo: Great. And I think this leads nicely into, something you've argued is that the credibility crisis essentially boils down to experimental design. I just wonder if you can expound on this a bit and tell us what it means and why it's the case.
Skipper: Yeah. So the poll standard of experimental design in science is the randomized control trial. At its core the randomized control trial is designed to understand the effect of intervention, traditionally whether it's something like a drug treatment effect, a jobs training program, I mentioned the marketing campaign already. What you wanna do is design a way to understand whether that intervention had an effect or not. And the best way to do that is to take a group of people, mostly the same or representatively random, and then split them up, give one the treatment and one not the treatment. Another good example of this is twins. Take two twins that are alike in every component of their lives. Similar upbringings, similar genetic make up, give one a drug treatment and don't give the drug treatment to the other, then measure whether or not the treatment had an effect. That's why experimental design is really important.
Hugo: And this speaks to the point of isolating the effect right?
Hugo: Of what you're trying to measure.
Skipper: Exactly. You want the only thing that's different between two groups of people to be the intervention. The thing that you wanna measure the effect on. And the randomized control trial is the best way to do this.
Hugo: You make it sound easy.
Skipper: Oh yeah. It's pretty easy. No, no. It's definitely ... there's a lot of things that you have to overcome to be able to do a randomized control trial. They're the gold standard in medicine and in science. But when you are dealing with more ambiguous things or even business processes, it's a lot harder to get one off the ground or convince people that you wanna do it.
Hugo: Absolutely. And I suppose a lot of our audience would have heard of randomized control trial or RCTs in the guise of AB tests, right?
Skipper: Yeah. AB test is usually a good introduction to randomized control trials. Think about an outbound campaign for sales. You work in a business that does direct sales which means reaching out to people who are potentially already your customers and seeing if they want to buy something else from you, or upgrade what they are already buying. So you have this kind of hypothesis that a certain kind of campaign on ... a certain offer on these people that are already engaged with you is going to have a large effect. You're gonna get more sales. So what you do is take all those engaged people, break them up, give one the A treatment, which is an offer that you wanna measure the effect on, and another the B treatment which may be some kind of ... like the offer that you're already giving them and then look at the difference. Afterwards you wanna measure and see whether the effective A was different than the effective B.
Hugo: Great. That's a great example and thank you for using example that isn't the color of a button on a webpage. And that's really important. What you've stated is a hypothesis that's based upon something rational as well, right? I've discussed this on this podcast last year with Lukas Vermeer who runs online experimentation on booking.com and the idea that, let's say you wanna change the color of a button because it increase contrast, or something like that. That can be a good example but just examples that are out of the blue, so to speak aren't necessarily that helpful.
Skipper: I don't spend a lot of time doing data science that drives that kind of level of product differentiation. That's definitely a part of something data scientists do. But we're much more centered on processes and kind of human behavioral decision making like within a business.
Hugo: Right. And what role does your economics training play in this type of work?
Skipper: That's a good question. I think it's intuition. Seeing a problem and seeing the solution to that problem. The bread and butter of economics is dealing with dirty data. They say it's often unethical in economics to go out and give some people a welfare form and not give that welfare form to other people to see whether or not it has an effect. Often we have to try to make sense of something after the fact. So we already have data that represents the real world. We observe that and then we have to go back and use methods or techniques to reconstruct something that looks a lot like a randomized control trial but isn't a randomized control trial.
Hugo: Oh I love it. And that actually leads very nicely into the next question I had which essentially was around, "What type of push backs you can get in organizations?" I know for example that people, if you are trying to do an AB test, managers might be like, if we think one's better than the other, let's just send the one that's better out to everyone. Why do we wanna send out stuff that doesn't do as good a job right?
Skipper: Oh yeah. Getting someone on board with an experimental design from the ground up, can be a huge cultural shift for people. People have been doing their jobs, probably pretty well for decades, sometimes, and they think, no, I know how to do things, this is what works, this is what doesn't work, I don't know why we would wanna try to kind of waste money and waste time to discover something that we already know. We already know this works. So in the sales example I gave earlier, if you go talk to a sales team and explain to them what you're gonna do, and you say, "hey I want this group to offer this incentive and I want this group to offer this incentive, the one that we've always been offering, the group that goes, "Wait, the other one's better. I just wanna use that." Their incentives aren't aligned because they get paid on commission, so they say, "Why would we even do that that's not gonna fly."
Skipper: And then sometimes you need to have a controlled group so if you wanna test the efficacy of a digital ad for example, you might wanna put out an ad that advertises your brand or whatever, but you need to send the same group of people an ad that it's not for your brand so you can have kind of a baseline to control against and you gotta pay to do that. So someone who runs a marketing campaign is not gonna spend half their budget on sending a PSA to people or something like that just to measure a baseline. It's just too much money.
Hugo: So how do we overcome these types of pushbacks? I mean it seems cultural to a certain extent, but also practical in some ways.
Skipper: It's cultural, it's practical. I think one way to do it is to use the data that you already have and design a kind of counterfactual experiment using observe data and then demonstrate slow wins. Demonstrate a small win, working with the data that you have using techniques from the social sciences quite often because that's what they're designed to do. Show what's possible and then slowly build credibility.
Hugo: Great. And get people onboard.
Skipper: Yeah. You gotta play the game of politics or whatever. I don't even think it's politics, though. I think it's just convincing people of the data driven decision making is going to make them better at their jobs.
Hugo: Yeah. Absolutely. And as you say, people been doing their jobs, this type of stuff for a long time. And particularly, you know, we hear about data science in tech a lot of the time but a lot of industries that pre-date tech are incorporating data science into them. Agriculture for example, so trying to convince people that work in agriculture that data science can help them get better returns, right? That's non trivial.
Skipper: Absolutely. A lot of early statistics was developed to try to understand what interventions to take to increase crop yields.
What can we learn from the social sciences?
Hugo: Absolutely. In that case what can we learn from the social sciences with respect to this burgeoning credibility crisis in data?
Skipper: Yes. I've kind of been talking around some of the methods that we've developed over the years to try to answer these questions. But maybe if I could just give a few examples. So first start with what was the problem we were trying to solve and then how might we attack that problem. My favorite examples is this experiment at a fish market in New York. I think it no longer exists, but it was called the Fulton Fish Market. And the reason this was interesting to an economist is that you would think a highly competitive, a lot of suppliers, a lot of buyers all coming into the same place and buying fish, you would expect the price to converge very quickly. You expect this to be a very competitive market.
Skipper: And in fact it wasn't always until economists saw this as like a little test bed of what they could study. But just to kind of describe one thing that they were interested in. Say they just wanted to measure the demand curve for fish, the price goes up, people buy less. That's the idea. But what we observe is, a price. We just observe a price at which fish was sold. And, as everyone who has taking Econ 101 knows, there is two things to go into determining a price. What's the supply of that thing, how many fish were available that day? And what's the demand, ow many buyers showed up that day?
Skipper: One of the techniques in econometrics is called instrumental variable regression. And it sounds super fancy, but the idea is we just want to find something, something that would vary the price of fish, but only affects supply of fish and does not affect the demand for fish. So people are still gonna show up and buy fish that day, but let's say the supply got curtailed or there was more supply than usual, for some reason, the thing that they came up with for this paper and this example, was weather. So if there was stormy seas that day the supply of fish would be greatly limited. So you would see no changes in demand so you could trace out the demand curve a little bit better by using weather that's called an instrument. That's one example.
Hugo: Cool. And we'll link to your, as I said, to your Pydata LA talk "What's the science in data science?" where you go through examples like this and techniques like instrumental variables in a bit more technical detail for those interested.
Skipper: Yeah. A little bit more technical detail, not too much math. Hope it's still accessible to people.
Hugo: Sure. We'll find out.
Hugo: Very cool. So as it really seems to boil down to design based thinking with respect to experiments and research, could you provide a couple of examples of such thinking?
Skipper: Yeah. Some example that are a little bit more on the business side and that kind of economics history side. One of my favorite papers is called "Courtyard by Marriott, designing a hotel facility with consumer based marketing model." So it sounds kind of boring and I can't believe my favorite paper is quantitative marketing now but it's so clever.
Hugo: And Courtyards are great as well.
Skipper: Yes they are.
Hugo: The Marriott courtyards really good.
Skipper: It is to me the shiny example of good data driven decision making built on the back of a very, very good but simple research design. It's also very old, I think this paper came out in '83, '84. Courtyard by Marriott started in the early '80s. So it also kind of shows how far back, 'sorry to do this kind of decision making' was happening and 'doing it well' was happening. The kind of punch line is that Courtyard by Marriott was a design by, we didn't call them this at the time, but it was designed by a bunch of data scientists, it was designed by a bunch of statisticians. How did they do this? They used another kind of technique that came from quantitative psychology and marketing which was basically they ran a survey experiment and then they used some econometric method to do some measurements.
Skipper: What was the survey experiment that they used? It's called a conjoint analysis. So some people may be familiar with this. It's kind of now heavily used in product marketing or product management. And the idea was just that you have a bunch of different things about a product that you want to test, a bunch of different hypothesis that you wanna test. And only so many people you can ask those of and you can't ask everybody everything. But you can ask some people some of the things. To be more concrete, there are seven facets that they wanted to test in this Courtyard by Marriott paper. Marriot saw decline in sales. They knew they had two kind of travelers. Business travelers and people traveling with their families, but they didn't know what they wanted. Right?
Skipper: So they said, what are the things you care about within these seven facets? What should the rooms look like? What should the food be like? What should our lounge look like? What kind of services do care about? What kind of security do you care about? So they asked a bunch of questions within each of these facets, something like 50 attributes for each facet and each one of those had eight different answers. So, do you want HBO in your room? Is basic cable okay? Do you want an L-shaped pool? Do want the pool to be in the middle of the hotel? All of these kind of things, ran that experiment and then it was the, made some decisions based on what they learned during that experiment and it was wildly successful.
Skipper: When they went back to measure in kind of the middle of this undertaking, they had used three test cases early and they saw, as they drew out these test cases, $200 million increase in sales. And they expected that to grow to within a billion dollars in sales by the early '90s. So over a decade they expected a billion dollars in new sales, incremental sales, to come from making these decisions to come up with this new brand courtyard by Marriott, which was the new concept. Created a bunch of new jobs, 10,000 new jobs for Marriott. And then the whole industry changed. So all the competitors had to change what they were doing. And the whole idea of what it meant to have affordable but nice lodging changed.
Hugo: That's incredible. And they also predicted the return or the increase in sales. Was that increase met?
Skipper: Yeah. They ran a bunch of simulations. They knew what people said they were gonna do. They ran a bunch of simulations knowing that some people may not do that. And they predicted the share of the market that Marriott would capture by undertaking this. I think that's probably the thing that ... the sales numbers I'm not sure they forecasted them. They did forecast the percentage market share they were gonna overtake. And they were right within 4%. So that's probably the thing that caught the decision makers eye right, oh man so you're saying I can get this. Start small, couple of tests cases, they're starting to see the results. Then they went all in and it turned out by carefully designing how they were gonna measure what they thought they were able to make this prediction and ultimately be right, win the market. That was pretty amazing.
Hugo: That is an amazing example of success story in research design based thinking. Are there any unsuccessful examples you can think of?
Skipper: Yeah. I mean it's not all unicorns and roses. There is a lot of places where it's gonna be very difficult to either predict success or even measure success. So I think it is important. I mentioned this a little bit earlier but it is important for data scientists to be upfront about what we can do and what we can't do so that we're not over promising. There is another example that I like that comes from the Yahoo research group. They were trying to measure the returns on some advertising campaigns, they had pretty much unfettered access to Yahoo data and they actually worked very closely with something like 25 firms, to get information on conversions for advertising. So one of the problems with trying to measure to advertising is it's sometimes hard to capture conversion. So if it turns into an in store sale, how do you make the connection that that actually happened through ads. And they were able to work closely with department stores and financial services to make sure they were measuring the outcomes.
Skipper: The name of the paper is "Unfavorable economics with measuring the returns to advertising." So it's a little bit of a reality check on these things. The conclusion is it's not impossible but it is very, very difficult. And what makes it difficult is a combination of two things: one is the returns to advertising are very, very small. Campaigns themselves usually have a low spend for person and the return on investment therefore is very, very small. So you may be looking at like a 10% bump if you're lucky. So that's kind of a small effect that you may want to measure. Then compound that with the fact that things like sales in a retail store are super super noisy. So most people do not buy things but when they do go buy things they're really, really expensive. So the standard deviation, how big the variance is of a sale is a lot higher than the average. So when you combine those two things, I wanna measure something very precisely but the thing I'm trying to measure is very noisy. It's very hard to design an experiment to measure that.
Hugo: And you need to just get so much data, right?
Skipper: Yeah. Exactly, exactly. You have to keep getting data to measure more precisely, which makes things often very, very expensive to do. And if you don't think about these kind of things upfront you may run a very expensive experiment and then come to the conclusion that, "Oh I don't have enough information to measure what I thought I was going to measure".
Hugo: Sure. And I suppose in this particular example, doing some sort of power analysis at the outset will tell you approximately how much data you need and how much that would cost.
Skipper: Yeah. Exactly. This paper actually did kind of a post hoc power analysis. So I went back and looked at 25 different experiments and wanted to know, after the fact, would we have been able to measure accurately, these things. "What is power?" First of all this is always a concept that mystified me when I was first starting to study statistics. But the basic idea is just that if there is an effect, if there is a return to advertising, and it is out there to be measured, the concept of power is, "Do I have enough data to be able to measure it." And the conclusion that they came to is quite often studies that are trying to measure the returns to advertising are really called underpowered. Sort of if they wanted to measure a very precise return, they just didn't have enough data to do it. And saying that they did would have been disingenuous.
Hugo: A PSA, it's always good to do some sort of power analysis during the experimental design process.
Skipper: Yeah. I think so.
Hugo: When you said, post hoc power analysis, I got shivers down my spine.
Skipper: Sure. Sure. People that are kind of more Bayesian in their statistical philosophical thinking may argue with whether power is a useful concept. But intuitively I think it makes sense.
Hugo: Absolutely. And if there are any Bayeisnas who feel that way out there, ping me on Twitter and I would live to have you on the show to go head to head. So Skipper, this has been a really, I think thoughtful introduction to the challenges we're facing as a discipline today, particularly in terms of the relation of data, analytics and data science to the decision function. What can we do? How is the credibility crisis preventable?
Skipper: One of the things we can do and the talk that I gave and kind of the promise of this whole conversation is that by better research design, by using different methodologies, you can gain credibility, you can get some wins, and you can have an impact. But I don't wanna over emphasize just methods. I don't want people to take away that "Oh if I go read an econometrics text book then I will be super credible, I will always deliver results." You have to make sure you're solving the right problem. So making sure you understand the business needs, what it is that people are trying to measure and how to actually have an impact and then go do things correctly. So make sure you're solving the right problem and then slowly build up credibility within your organization and then more broadly within the profession I think.
Hugo: And what about, I always hesitate to use the term soft skills, but I've been too lazy to think of a better term. Just about, in terms of dealing with the bureaucracy and structure and hierarchy in organizations, how does that relate to this conversation?
Skipper: My prior on the typical data scientist which may or may not be accurate as at they come from an academic background. They're very used to long timelines with not a ton of stakeholders and what they are doing. They may be going off and working on a single problem with their advisor or maybe lab mates, very deeply for a long time. And the business world is slightly different, in that you can have multiple stakeholders and people are gonna kind of lose faith very quickly. It's not just methods, it's not just good research design, it's good project management, good communication. Letting people know where you are being honest with them about, am I stuck? This an assumption I have made, it may not be true, here's when you can expect results. Go out and read a good book on project management because even if it's not in your description as a data scientist, you're gonna build up credibility and you're gonna have more soft skills if you approach things like that.
Hugo: Great. Do you have any recommendations for books on project management?
Skipper: Yeah. There's one that we read around here. We have several book clubs and one of the things that we read in book club recently was this project management for the unofficial project manager. And the premise is kind of exactly what I just stated, like whether you like it or not, you're project manager. You may not know it and you may not even know what that means because no one has ever talked to you expressly about that. Here is how we think you can be effective at this. Know where you are going before you start, that kind of thing, which is also kind of the opposite of some academic pursuits. Sometimes you can find very interesting things by going down rabbit holes. And that's something, not necessarily always to be avoided but often to be avoided in a business context.
Hugo: Okay. Great. And we will link to that book in the show notes as well. To push even a bit further, what could happen if the credibility crisis isn't prevented?
Skipper: This is what kind of keeps me up at night a little bit. And I think the cynical answer is that job market for data scientist kind of drops out so it ceases to be a seller's market in that data scientists will be hired less and paid less. You hear this is the sexiest job of the 21st century, those kind of things again and again. And that is pretty far along in that kind of Gartner hype cycle into the trough of dissolutionment here that I hinted at earlier. And if that continues then I think you'll cease to see data scientists be like a really emphasized and hot job tittle. Which would not be great for us. I think it's a lot of fun to be a data scientist.
Hugo: Actually where we are in the Gartner hype cycle I suppose is a function of industry as well. I think in a lot of industries we are still at the peak of inflated expectations.
Skipper: Yeah. That's probably true. I mean it's a little bit of a balance and it depends who I've talked to that morning but yeah that's definitely true.
Hugo: So that was your cynical response. What else could happen if the crisis isn't prevented?
Skipper: I don't know, I'm not sure I have really good answer to that. Nothing, we go back to the status quo, people go back to making gut-based decisions on the kind of promise of data science is never actually realized. So it's not just, it's somewhere in between the trough of disillusionment and kind of over hyped expectations. There is value in making evidence based data driven decisions. And it's kind of up to data scientists to prove that.
What is your favorite data science technique?
Hugo: Yeah. Agreed. So to wrap up, we've talked about a lot of your different interests, but something I like to ask is, just what one of your favorite data sciencey techniques or methodologies is?
Skipper: Sure. I think logistic reaction is a great one. I joke a little bit. It's quite often we ask people have you tried logistic regression first but a little less glib, we've been working a lot at using factorization machines a lot here. It's a cool technique that kind of allows you to explore the interactions between data in a way that its tractable. So you can kind of compare nonlinear effects really, really easily. And we found we got a lot of use out of factorization machines.
Call to Action
Hugo: Awesome. To wrap up, I'm wondering if you with a final call to action for our listeners out there.
Skipper: I'll leave people with kind of one hard call to action, it's something that I talk about when I talk about my pydata LA and that's like the benefits of having a journal club at work. Having a bunch of like minded colleagues and going out in the literature and finding a paper that's cool. Reading it, breaking it down and discussing it with your colleagues on like a weekly basis. I can't overestimate how much value and how many ideas we've generated just by doing something like that.
Hugo: Fantastic. Skipper, it has been such a pleasure having you on the show.
Skipper: Yeah. It's great talking to you. Thanks for having me.