Hands-on learning experience
Companies using DataCamp achieve course completion rates 6X higher than traditional online course providers
Learn MoreAs data science teams look to scale their impact across the organization, data scientists often spend a long time optimizing and refining models. All too often, they end up neglecting what may be the most important selection criterion for scaling data science: acceptance by the end-user.
When presenting results, data teams often fall into the common pitfalls of communicating the impact of their work and are often met with skepticism towards the models they build and how they fit within the organization. This translates into decreased trust in data science, hyperfocus on edge-cases when scoping machine learning projects, and resistance to change or adopting data solutions. Getting everyone on board and earning a wide range of approval is often a critical step in successfully scoping, implementing, and evaluating models and their results.
In this webinar, we outline how storytelling enables data teams to bring the same facts to the table, but provide a clear framework that helps formulate the most impactful aspects of their work. We cover some tips and tricks using real case examples of how to convince the most skeptical end-users.
Storytelling is a core skill in the data teams' skillset
Storytelling in data science and machine learning can be just as factual as other presentation forms for communicating impact
Crafting audience-specific data stories increases the impact of data science within the organization
Bhavya Dwivedi: Hello, everyone. My name is Bhavya. I am a data scientist at Deloitte. I have a background in computer science. I earned my master's degree from the Georgia Institute of Technology. At Deloitte, I have worked within several domains, such as healthcare, finance, energy. I have used technology such as machine learning, NLP, and deep learning.
Gert De Geyter: Hi everyone, I'm Gert De Geyter, a machine learning lead at Deloitte Consulting. My background is in astronomy, so I have a Ph.D. from Ghent University where I focused on applying AI to solve problems in astrophysics. Next to my job at Deloitte, I'm also a professor at the School of Economics in the School of Management where I teach python and machine learning. But today I'll talk about something else.
Gert De Geyter: Today I will talk about storytelling. So what better way to start than with an actual story. So, in this case, it's actually a myth. Some of you might have, at some point, in your history classes or school, have heard about the myth of Cassandra of Troy. Cassandra of Troy was supposedly a beautiful woman, and she caught the eye of the sun of God, Apollo. In an attempt to persuade her, he gave her a gift. He gave her the divine power of perfect foresight. Whatever would happen in the world, Cassandra would be able to foresee and predict it. Now, she still wasn't really interested in Apollo so she turned him down. To curse her (because he obviously didn't like that), he couldn't really take away the gift anymore, so what he did was to give her a curse.
Even though she would be able to predict the future, no one would ever believe her. And that kind of made the Greek tragedy complete because she would foresee Troy fall; she would foresee her loved ones die, but there’s nothing she could do to stop that. So there are a few takeaways from that Greek myth. One is don't mess with the Greek gods, and the second is that predictions require actions. Even though Cassandra was really able to perfectly predict what was going to happen, she was never able to make an actual impact, because she really couldn't persuade anyone. That's the lesson that we can take when we think about today's data science, or think about today's AI applications.
To be impactful, data science needs:
Cultural company change
AI knowledge for all
Data scientists with stories
Apart from the data science that we have, and that obviously should be done correctly to be usable and so on, there are three things that I think we can add to make it even more impactful than what we have. One is a cultural company change. Second is AI knowledge for all, even for those people who are not directly involved with the research or development of AI. And the third one, data scientists, should have data stories. It's obvious that last one that we'll dive a little bit deeper into, but I quickly want to highlight what those other two are.
In a study that was published in Harvard Business Review, they found out that in nearly 90% of the companies with successful AI implementations, more than half of their budget was spent on adoption. I quickly want to pause there, because I think that is a staggering result. More than half of the budget doesn't go to pure R&D. It literally goes to making sure that whatever comes out of R&D is implemented and used throughout the entire firm. That is apparently the way to go, given that that's what 90% of companies with successful AI do.
The second is AI for everyone. AI, at this point, is somewhat at the peak of what is called the Gartner Hype Cycle. This Gartner Hype Cycle basically explains the phases that new emerging technologies go through. Usually, there's somewhat of an innovation trigger that sparks a new tool or a new solution. That quickly leads to a peak of inflated expectations. People just expect too much of this new technology. That is very quickly followed by an overcorrection — people suddenly don't trust the new technology at all anymore, so they go into a trough of disillusionment. They really don't see the added value, and they overcorrect.
Then, over time, you stabilize into a plateau of productivity where this technology is now finally matured enough where it can be placed in. This is where it's useful, and this is where it's fun. Having fundamental AI knowledge for everyone can really avoid the situation where AI becomes a buzzword or a sort of holy grail that is expected to fix all problems. You can really see it somewhat like the beginning when computers were made. They never would have guessed that this, which was something purely for ideas, now being literally ingrained in pretty much every job. So AI will be very similar to that.
The last part is the data science skillset. We've all seen descriptions, where it usually mentioned something about - we want our data scientists to be able to wrangle data, manipulate data; we want them to learn statistics and modeling; they should obviously be able to program, and if they can do that with big data technologies all the better. Usually, you will also have something in there that has to do with storytelling, communications, visualizations, or soft skills in general. And this is where we have I think the biggest duality of data scientists' role because even though storytelling and communication are considered to be one of those top 3, top 4 major blocks of what data scientists should have, it is not at all reflected in what we see in education. If you look at general education, if you look at online courses, they are focused way more heavily on the statistics on the programming. There's hardly any mention of storytelling or soft skills at all.
It's not just education. So when we looked at the conferences that were listed on KDNuggets between 2018 and 2020, we tried to label them to the best of our ability with the information that we found in these categories. I want to point out that this is not the result of some clear conclusive research but it should be a good enough indication of what is going on. We found that only in 1% of the cases, storytelling was actually talked about at conferences. So, storytelling at this point is an underrepresented topic for data science education as well as conferences and that's where again the duality lies: we consider it as one of the major skills, but we hardly talk about it.
This leads us to believe that storytelling or soft skills are just something that people have and not something that can be trained. And that is false. The same can be said for mathematics or programming, where we see that some people have an affinity to be better at mathematics or programming. Some people will have an affinity to be better at storytelling or have better soft skills. But that doesn't mean that there are no fundamentals that can be learned for everyone. To that extent, what we're trying to cover here in this webinar is not going to be a substitute for an entire education in storytelling. I want to make it clear that this is a webinar to highlight the problem. It gives you some ideas or pointers on what to focus on, what to look at, and some ideas of what to keep in mind when you go and present, either e-mails or live presentations or others. And we're going to use this simple framework: Why do we have storytelling? Who do we tell it to? What is the message that we try to bring? How do we do it? What is the method? This is a robust framework that we'll try to apply to the three use cases at the end before we kind of wrap it up.
Bhavya Dwivedi: The obvious question — why must data scientists employ storytelling? We usually look at numbers and we can easily present those numbers to the audience, right? Not really, that falls completely flat. So, what about stories that make them so effective? Simply put, stories engage us on more than one cognitive level. They are able to create a vivid and rich sort of participatory experience. By doing that, they pull us in. When stories pull us in, they help us remember. Stories can serve as a framework that ties facts, numbers, and other relevant information together and help us remember that information. In fact, research suggests that facts are 20 times as likely to be remembered when they are a part of a story.
Meet Yanjaa who is pictured here. She's a world memory master, and she holds multiple world records to prove that she has one of the best memories in the world. She, and most other memory champions, use a technique called the method of loci or mind palace to have them remember things. Now, some of us here might remember this technique from the famous show Sherlock. What this technique basically focuses on, is among other things, of course, creating stories to remember facts. Facts like where was an object placed, or what number appeared in a previously unseen series of random numbers? People also have, at some point, used similar techniques to remember mathematical formulas, or certain facts, such as this. It's a mnemonic that all of us at some point have used as a small story to remember a fact about our universe.
Something noteworthy that is going on here is that as much as humans like to think that we resemble computers, we really do not. Computers are good with numbers; humans are good with stories. That's true even when it translates to the fact that stories require a larger footprint in our brain; they use more storage space in our brains. So this is something we should keep in mind when you are communicating computer-generated results to other humans. As data scientists, we should create stories so that people can remember numbers; people can remember facts. Those stories should be our representation of those numbers. That way we can ensure that the numbers are not misrepresented, so we can focus the attention of our audience on the important pieces of the puzzle.
Now, imagine I tell you about a day when I was looking around for my lost puppy. It was the peak of summer; the day was hot and sultry. By the time I accomplished my mission, I was all sweaty and tired. When I finally made it home, I jumped right into the shower and turned on the cold water, and all that heat and tension melted away from my mind. So, if you were listening to that short story, you could feel and share my sensation of freshness when the water started in the shower, didn't you? Let that sink in. Because that is another thing stories excel at — creating connections not just between people and ideas, but also among people themselves. And we see this happening all the time. Persuasive commercials often engage us in more than one sensation like visual, auditory, etc, using stories. Imagine a commercial that just lists the pros and cons of a product. How bland would that commercial be, right? It will never be as impactful as a commercial that talks to its audience via a story.
In 2010, scientists at Princeton used MRI scans to show that when stories are narrated, they activate the same regions in the brain of the speaker as well as the listener. This phenomenon is called neural coupling, and it creates a connection between the speaker and the listener, enabling them to feel similar sorts of emotions. That is the engagement of a larger part of the brain that allows stories to stick, stories to be understood, stories to be felt. So stories move people, stories drive people to action, and that is exactly what we aim to accomplish as data scientists. Tying it all together, as data scientists, our goal is to influence our audience, and information alone rarely does that. Scientific research has also shown that well-designed stories are in fact the most effective vehicle for persuasion and consequently driving people to action. Next, Gert is going to talk about who we should keep in mind while developing our stories.
Gert De Geyter: After understanding why you need to tell stories, it's important to keep in mind who we're addressing it to because that's going to influence a lot of aspects of the story. First of all, who exactly is our audience? How much do they already know? What is their background? What is their interest level in what is going to be presented? What is their technical background? What is their data fluency? Are they technical people? Are they non-technical people? Is it a mixed audience? If it's a mixed audience, who am I trying to address? What is my relationship with them? I could be a teacher or a mentor talking to students, but I could also be talking to my boss, a board of executives, or clients. All of these factors are going to play into how you phrase and how you message your story.
Most importantly, you need to figure out, what is the desired outcome? What do I want my audience to take away from this? Even if it's one number, even if it's like just my model accuracy, whatever it is, what do I want them to take away? Because the most important thing, and this is a hard-learned lesson for myself, is that you should tell your audience what they need to hear, not what you want to tell. The problem that we often have as data scientists is that we get very excited about something very cool about our model, a cool optimization that we've done, and it is fine. And it's certainly understandable that we're passionate about it; that's why we're doing this job. But it might not always be the right way forward. Not everyone in your audience might be interested in the same level that you are, and they might not be even really heading to the point that you're trying to get. So keep that in mind, that your audience just needs to hear stuff, and it doesn't necessarily align with what you want to tell.
Also, be aware that your audience is going to have or could have a certain level of skepticism. Often data scientists kind of shrug that off as, "oh, that's just because they don't know AI or they don't see their value," or stuff like that which leads to the point that I said earlier that there should be a fundamental AI knowledge for all. But that is definitely not the only source of skepticism. There could be real reasons or good reasons for skepticism towards the approach that you're presenting. For example, they could be data experts, and they might know that there's potentially a huge bias in your data sets. So it's pretty skewed between what is actually going on and what the data is telling you, or there might be missing data that you're unaware of, or they might have had experiences in the past. So there are good reasons to have skepticism, and you shouldn't always shrug it off or see it as a part that you should deny.
One of the key things that you should be most aware of as a data scientist is anecdotal evidence. Anecdotal evidence is basically an outlier with a very good story. Most people, in general, tend to remember outliers just specifically because they stand out from the average. We are predisposed to give more attention and more weight to these outliers than we should. To give an example, I think everyone on this call probably knows that smoking is bad for your health. But we probably also know someone who might have said something along the lines of, "well, if smoking was bad, then why do I have an uncle who smokes his entire life and turns out to be 96?" We know that does not disprove the statement that smoking is bad for your health. It's just a weird outlier. but for some people, it's very binary because not everyone is used to thinking in terms of probabilities. So, you need to incorporate that into your story. You need to incorporate these weird outliers into your story and tell them that this is not something that disproves a statement. It's just very rare, and for every one of those cases, I'll have a lot of other ones that are very different.
Also, your story order might need to be adjusted to the audience that you have. In general, a story usually consists of a plot, goal, message, or conclusion. There should be some context, arguments, or supporting evidence, and these are somewhat connected to the general theme and flow. In general, data scientists or technical people prefer the order of arguments leading into or flowing into a clear conclusion. This is somewhat a sort of chronological order. They want to see the entire process that you took to come to that conclusion. Decision-makers, stakeholders, business people usually prefer a different flow way, where they want you to start with a clear conclusion, like this is what I found, and then flow into what the resulting context or actions around that.
So keep in mind who is it that I'm talking to, and how should I phrase my message or phrase my story for them? So, keeping in mind that your audience is going to drive a lot of how you tell stories. You will see some examples later where we go back on the "how" where we'll tie back into the audience.
Bhavya Dwivedi: All right. So, now that we know who we are targeting, I'll discuss what are some things to consider while crafting our message. Consider a scenario where an institution uses an AI system to approve or reject loan applications based on certain input, and when I make an application to that institution, it's rejected. The first question that I'm going to raise is why does my application get rejected? How can I fix it? The expectation is to get a response like your credit score was low, so please apply when your credit score is better. So this is important because such transparency fosters trust and acceptance in AI systems. In fact, these regulations such as the European Union's Right to Explanation allow people to demand transparency or algorithmic decisions, underlining the need for explainability in AI. Data science is no different. Every data science endeavor has an explainability aspect to it. Users want to understand why you are proposing whatever it is that you are. In an ideal situation, interpretable, transparent, and explainable AI models are what all data scientists would use because explainability makes models more human, more expressive, and therefore infuses some sort of credibility in them. Developing such models is challenging. In certain cases, it's even impossible.
Let's briefly discuss black-box and white-box models. Both types of models generate predictions. While black-box models have complex inner workings, because of which they are hard to be understood and explained, white-box models are simpler and score higher on the interpretability aspect. Another point that is important to distinguish these two models is black-box models are focused mainly on performance while white-box models might not match up to black-box models in terms of performance.
The choice of using a black-box or white-box model is crucial because it ties back into our story of how we're achieving a certain goal. And to make the decision of which type of model to use, there are certain kinds of things that we as data scientists could think through. For example, is that a certain accuracy threshold that is desired from the models? Can a white box model achieve it or is it only a black-box model that can get us there? Usually, there's also a tradeoff between accuracy and explainability. So balancing those two important characteristics becomes another important consideration. Business users usually lean towards more transparency, whereas technical users might in situations be convinced with a less transparent model.
Finally, the complexity of a black box model should not be used as an excuse to forego them. Simplicity and explainability are important but they might not always be preferable. Think about non-linear, high dimensionality data in use cases like skin cancer detection. White-box models in such cases simply cannot capture the essence of such data. As a simple example, consider that I have a white-box model that is producing predictions with great accuracy. The question that arises soon after is: so what? Black-box and white-box models help us in generating predictions, but what should we do after we have reached a desired level of accuracy? This is exactly where predictions fall short.
There are a number of gaps between making the prediction and making a decision. As data scientists, we must be aware of this prediction-decision gap and we should use appropriate methods to address it. Methods like adding guidelines, which could be business rules or decision tables, or augmenting our predictive models with the decision models as a way to nudge the users in the direction of decision making. All these being said, predictive models are useful when they are employed in the right way, but they can also be misleading. Some users can accept or have the false impression that a feature or a set of features, can explain the behavior of the model.
What I'm talking about is the classical statistical fallacy — correlation does not imply causation. In reality, predictive models only establish correlations between variables and targets. A lot of times, people are looking for causation, and it is important to realize that prediction is not a valid substitute for an explanation. Being able to predict an outcome is not the same as understanding what causes it. For example, predicting that a farmer's crop yield will be lower in a year is one thing, and understanding what steps will increase the harvest is another thing. Another example, it's not hard to predict that the sun will rise every day, but it's harder to explain why that is the case. So, data scientists should be to employ models that incorporate this concept of causality. I said that, but it's not an easy thing to do. For many years, statistics have lacked the mathematical framework to talk about causation. Luckily for us, over the last few decades, there have been positive developments in causal inference. Now we have methods such as causal diagrams that can be useful in making better sense of models in being able to explain models better, and thereby helping us build more powerful AI models.
So, summing it all up — explainability and transparency are important considerations while crafting our stories. Yet, by themselves, they are not very useful. We need to augment Predictions with actions to make them impactful. But what is even better than that, is moving towards causal AI (models that integrate causality). With that, I'll hand it over to Gert to talk about the last point of our storytelling framework.
Now we have the reason for why we want to connect to our audience, who they are, understand our relationship to them, how that drives the transparency of our models, or how actionable they should be. The last question is how are we going to bring this? In general, there are two main ways:
Written document
Live presentation
I want to make it clear that storytelling should not stick to just big sessions or big live presentations. You need to do storytelling on a day-to-day basis, even in written documents. Written documents usually need more detail because you have less control over how your audience is going to perceive them. Live presentations can also be quick one-on-one calls where you usually have a little bit more control over how you're going to bring the message. Usually minimal is better. Very minimal. When you do a live presentation, stick to one core message for slides so you don't have your audience's attention drawn to too much clutter on the screen. Keeping them focused on what the core message is is vital.
Or even more minimal. Don't use any slides at all. By switching modes in your presentation, you can actually grab the attention of your audience back to whoever is presenting or speaking. Sometimes we get into this loop where we are stuck at the screen for too long and we go into a zombie mode of just staring at the slides. By making a blank slide to kind of break that loop, you force the attention back to you. Going back and forth between switching modes in your presentation keeps your audience engaged and keeps them very clear on what you're trying to tell them.
When it comes to attention, you should also be aware that not everyone is going to read over your slide bullet-point-by-bullet-point in the order that you would think. For written documents, most people tend to follow an F-pattern where you start in the upper left corner and then scan over in horizontal lines to get a rough idea of what the document is about. This Gutenberg Diagram can also be applied to slides. In this case, though, most people tend to follow a sort of Z pattern, starting in the upper left corner, going to the upper right corner, then diagonally scanning down, and ending in the right bottom. Now that we know that we can actually use that to our advantage. If we want our audience to have a certain flow in our story, we can use that and seek to place the elements exactly in that order, because we know that's how they read through it.
When you talk about visuals, you have to be aware, and you have to be very critical of what you put in. For example, suppose I put a visualization, a lot of people are just going to blankly stare at the chart. Some other people are just not going to see any added value of the visualization — they might have already gotten the point before, and they would have gotten it if you've just explained it so they don't really see a need for the visualization to be convinced. Other people are going to be annoyed by the type of visualization that you use. Others might just think that the numbers don't make sense — they don't know enough; they don't believe them because they are not detailed enough; they want more details; they want more technicalities; the aggregation that you've done to create this plot isn't up to their expectation. The last reaction might just be getting triggered because, for example, in this plot, we deliberately placed a pie chart at the center to show that this will grab people's attention. They will get focused on minor details and wander off the actual point that you're trying to make.
So, visualizations can be strong but they can also be very misleading and have people focus on things that you don't want them to focus on. In general, we want to avoid density- or surface-based plots. Pie charts, for example, are actually not ideal choices because they are radial charts. If you have one part of your pie chart that is twice as big in angle, it's actually πR2 as large in area. So you need to keep in mind that our brain doesn't work really well with surfaces. Another example is word clouds. Word clouds might look cool, and they can give you a sense of what words are in your text document, but they're very bad when it comes to comparing one word to another. In this example that you see right here, how much more important is the word landing versus space? Is it the same? Is it not? Is it just larger because "landing" is a bigger word compared to "page"? How does that relate? So, those types of things are a lot harder to get from a word cloud. Also, in general, avoid 3D graphs. They can be great to kind of quickly draw the eye because they look cool and flashy, but they're usually very hard to interpret, and usually not that good to actually get your story. Also, don't mix a combination of charts. Just keep it very clear and clean with one message.
Next, what is the message of the visualization? Don't just put slides or visualizations in there just because you've made it. There should be a clear goal on why that ends up in a presentation or story. Focus their attention on what you want to highlight. You can play with colors, font sizes. There are books on this, so I'm not going to cover too much detail on this. Also, don't over-engineer. Keeping it simple, keeping it fairly clear, is usually the best way to go. Then the last thing, and this is a hard-learned lesson for me as well, is to allow for whitespace. One of the things that I also tended to do was fill up the entire slide, so my visualization took as much area as possible. But that's wrong. If your visualization is good and clear, you don't necessarily have to enlarge it to the full screen. Allowing for some reading room and some white space is going to help and make your slides lighter and easier to digest.
We go to a bad example to show you how we can actually use visualizations to create a better story. I went back to the visual that we saw earlier in this presentation on the importance of storytelling competencies, where we saw that storytelling is underrepresented as a whole in conferences. If I took one of the standard Excel visualizations for this, you would get something like this. This is probably something that we've all seen before at some point. Now, why is this bad? First of all, this block right here will be the first thing that draws your attention, but it's actually not really relevant for this story at all, but you're just immediately drawn to it because it's bigger because it's by far the most impressive. So we want to know intuitively what that is. Also, we're repeating that this is a percentage of total topics in this legend, which doesn't really add much. There’s a background that makes it very heavy. Also, even the y-axis doesn't really add too much information because we constantly have to go back and forth in checking how much is the actual percentage.
It could be better to just label the actual bars with numbers. Also, a better way is to, first of all, turn the chart around because we are better as humans at understanding horizontal bar charts. The way our brain works is actually better in measuring something horizontally compared to each other than when it is vertical. Also, by graying this out and highlighting this, we focus the attention on where we want. Remember, we go over slides in a Z pattern. So, by immediately having a conclusion right here on top, it is immediately the message I want my audience to get.
Another example of visuals is the one we see here. On the left, you see a history of electronic tickets where you see the volume received and the volume processed. This is the same data set. The main message here was actually that there were two employees that quit in May that led to a lower amount of tickets being able to close. That is a very hard message to get from here. In this other visualization, it is very clear (especially because we even highlighted here) that before we didn't have an issue because they perfectly overlap, but from this point on we see a divergence between the received and processed tickets. Clearly, pointing out the message and the goal of your visualizations can be very powerful in helping your story.
So, overall, when you talk about stories, think about the methods. This could be anything. You don't have to stick to presentations, live calls, or whatever. It could also be a simple email that needs a story, or a note, or whatever. It doesn't have to always be live to an audience. Think about how you're going to incorporate your visuals that really support the message that you're trying to convey. Now that we have everything, we're going to try and use this framework of “Why do we want to do this? Who do you want to connect to? What [message are we conveying]?” in three very minor use cases. Again, I want to remind you that this is not a course on storytelling. It's obviously not possible to cover that in a 45-minute webinar, but I do hope that this helps you frame some of the stories that you have.
Storytelling case study one is of a data scientist who wants to report to his supervisor (a senior data scientist) on the work that he or she has performed. Why is this necessary? For validation. We want to have validation from the senior data scientist that the work was done right so we can continue with it. Who is the audience? The audience is a senior data scientist, so we know that chronological order arguments flowing into results or conclusions is the best way to go. What are we going to present? Detailed message. We can be very detailed on how we're going to convince the senior data scientist, so we can be very quantitative. What is the method? An email, a presentation, or a notebook? That depends on the situation and ongoing relationship between these two data scientists and what they consider to be the best methods. Again, it doesn't have to be a live presentation. It could also be a document.
Case study two is of a senior data scientist giving an intermediate report to an internal or an external client audience that consists mostly of data subject matter experts (SMEs). Why is this necessary? It’s also for validation of the analysis that we have. It's data SMEs. That means they could be chronologically driven if they're more technical. It could also mean that they’re more action-driven. That depends on the people who are in the audience. Given that it's data SMEs, they probably want to see a lot of the basic data validation, or they want to have a confirmation of what is inside of data and what the model is, and they probably want to see the presentation of EDA plots.
That last case study is that same senior data scientist presenting to a board of executives to update them to call for action. Given that we have an executive board here, we want to lead conclusions into resulting actions. We want to keep it as high-level as possible and potentially go to a decision-making model or causal model. Usually, that ends up being a pure presentation.
In the beginning, we mentioned that to be impactful, data science would need a cultural company change, AI knowledge for all, and data scientists with stories. Data scientists should therefore invest more time in messaging. Remember that 90% of the companies that were successful in implementing AI spend more than half the budget on adoption initiatives. That does not mean that every data scientist should now spend half of their time on those adoption activities, but at least points out that we probably need to pay more attention to it.
We should educate everyone on it. We should get everyone involved and get to a baseline where AI is a language that is understood and used by everyone that needs it. We should have storytelling as a fundamental skill. It is not something that is given to everyone. There are fundamental things that can be learned. Imagine if we had this sort of framework that we can, like what you do in scikit-learn, that serves as a sort of template for a story. For example, if this is the story that we want to bring to this audience, you use this framework. How awesome would that be? But for now, you don't have that because it doesn't have enough attention and it's not been taken in the same way as some of the other skills that we have though it remains one of the most fundamental skills that you should learn as a data scientist. With that being said, I hope this is a call for action to get more stories out there and incorporate them into it.
Question: Should we be using the persona approach to tell a story with data?
Gert De Geyter: Yes. If you can do that, that is absolutely a great way to think about who your audience is. And it ties back to this slide on the audience — what is your relationship to them. If you can build personas for them, that is a really good way to create your story and to see how you're telling it and what you're going to tell. Your personas probably also have a sort of data fluency connected to them. So, use personas if you can. It’s absolutely a great way to incorporate storytelling.
Question: Could you define black-box versus white-box model again?
Bhavya Dwivedi: Yes. Both black box and white box models are used for generating predictions, but how they are different is that black-box models have complex inner workings. Think about deep neural net networks. Even people who write them sometimes find it hard to understand and consequently explain it to stakeholders. What they're looking to do using a black-box model is essentially a better performance. That's what we're looking to get out of them. White-box models, on the other hand, are models like decision trees which can actually explain why a certain prediction is being made based on given data. These models might not perform as well as black-box models. But, in a lot of situations, stakeholders prefer them because they can be explained, and the decisions that are made are more explainable. I hope that answers the question.
Question: How does data storytelling integrate with the data science life cycle?
Gert de Geyter: Yeah, I think that is an interesting question. We think of storytelling needing to be done at the end. That's not necessarily true since we can have storytelling as part of the interactive, ongoing discussion. In a data science lifecycle, I’m going to assume that we have EDA, we have some preliminary models, we have some final models based on some feature engineering before. During all of those steps, you want to have a continuous conversation with your stakeholders. Because of that skepticism that I was mentioning before, the sooner you're aware that it exists, the earlier that you can actually start taking care of it. What are the issues? Why is there skepticism to the potential stakeholder? Could I convince them by showing them a certain part of the dataset that they might not be aware of? Is there something in the model that I could change to help them be convinced of how they should use it? The earlier you are aware of that, the earlier you can start having that conversation. And potentially maybe also finding out that they are right to have that sort of skepticism. For example, as I mentioned, there could be a bias in your data set that you're not aware of, and they are. So the earlier you know, the better. So, yes, definitely try to integrate that with your data science.
Question: I'm great at storytelling and I love speaking, but my data science skills are relatively still a novice. Intermediate, but still learning. Are there career opportunities for people who love data analytics but are not really that skilled with analytics and coding, but might want to be the people that share insights and stories that the data team produces?
Gert de Geyter: Yes. That is a really good question, and I think there absolutely are. To be very frank, I hope there are even more coming very soon. Especially again, the quote that I made that for companies to be successful they need to spend half of their budget on adoption activities. I see that as an ideal role, as someone who could bridge the gap. Look, I'm not a full-on data science expert, but I know enough of it to be able to message that to the people that need to be aware of this. So bridging that gap is going to be a crucial role, I think, going forward. I've seen that role. Usually, most people that are in that role often come from a technical role and kind of go into that role. But I've definitely seen a lot of people who do the other route where they don't have the technical ability but they're better at storytelling and speaking. They gain that by doing some additional courses on the specific technicalities that they feel they're missing. But, yes, I definitely think that there are career opportunities to do that.
Question: How do I know which chart to use for a particular message? Are there any tips or resources that we can check out?
Bhavya Dwivedi: Yes. I knew a few examples based on my experience about certain kinds of charts that I use very frequently. For example, bar charts, we can easily use them to compare one variable across several groups. Histograms are special cases of bar graphs and they usually are used for continuous variables. Another type of chart that I use very frequently is scatter plots. Basically, scatter plots are used to understand the kinds of relationships between two different kinds of variables. One more kind of chart that we very frequently use are box and whisker plots, which are used to figure out if there are any outliers within your data. I want to say that when you work with data, you get a sense of what charts you want to use. As for resources, I think Adel already mentioned this, but I'm 100% sure that there are so many great resources on DataCamp that can help you develop this intuition of which are some of the best in what situation. All this being said, I want to say that, like Gert said in the presentation, try to keep it simple. There could be so many charts, There could be many ways of representing your data. But try to keep it as simple as possible.
Question: What about an audience that you think loves the kind of graphics that you said are bad? How can we try to change them? Do you think that data scientists or data teams should try to change the behavior of the audience? Or do you think that we need to give the audience what they're used to and what they need in that sense?
Gert de Geyter: That is a very good question because that is definitely something that I often struggle with as well. A lot of people just love word clouds, and a lot of people just love pie charts. What I usually tend to do is just make both, and try to explain why one is better than the other. However, in the end, if someone decides not to pick up your advice, I mean, that is their choice. We can only show and try to educate and explain the reasons why one chart will be better than another. What you could often suggest is, “why don't we just go around the room and ask what people think is the clearest?” When you do AB testing of two types of charts, you will see that if people are honest they usually go for the correct one or the one that is more clear. And that is often less traditional, because, again, the pie chart is fairly traditional. What I didn't talk about here as much is that there is also obviously legacy [systems] you need to work with. There's also often a color scheme that you're bound to. For example, your company might have certain brand colors. Those could be a limitation to a certain point as well. And that's something that you'd need to be mindful of as well. So it's not just even what type of visualizations, but what options do I have in highlighting the parts that I want, and you sometimes have to be creative to what you are given there as well.
Question: How can data teams or data scientists or data practitioners develop storytelling skills? Are there any resources or training that you'd recommend?
Gert de Geyter: Yes, I think so. First of all, if you look online, there's definitely enough material out there already. I also always encourage data science teams to start practicing on a daily basis. Sort of like taking a course online as a student where you might have presented your work, do something similar with your data scientists. Have them present what they're working on on a weekly or biweekly basis, and potentially even give them feedback like, “Hey, look what you've presented here. Think about that. Try to frame this differently.” Create a sort of ongoing training plan with everyone in your team, where you move forwards and give them the chance to present internally and grow in that way.
Storytelling holds a significant role in enhancing the influence of data science. Data scientists should balance their technical skills with effective communication techniques to convey their findings to varied audiences. Storytelling in data science is about constructing narratives around data, making it memorable and actionable. By crafting engaging stories, data scientists can simplify complex data, making it more understandable and relatable, which enhances decision-making processes. The webinar also explored the importance of understandable AI models, comparing black box and white box models, and emphasized the need for a broader understanding of AI and data fluency across different sectors. Moreover, it was stressed that storytelling skills, despite being essential for sharing insights effectively, often lack adequate representation in data science education. The speakers urged data scientists to be mindful of their audience's background and adjust their communication strategies to effectively link technical findings and practical applications.
Stories engage audiences by creating vivid participatory experiences that aid in data retention. Researc ...
Read More
The difference between black box and white box models is their interpretability versus performance. Black box models, while powerful, often lack transparency, making it difficult to explain predictions to stakeholders. White box models, such as decision trees, are more interpretable but may not match the performance of black box models. The choice between these models depends on the context and the need for transparency versus accuracy. An important point raised during the webinar was the trade-off between explainability and performance, with Gert noting, "Simplicity and explainability are important, but they might not always be preferable," especially in complex scenarios like skin cancer detection.
Effective storytelling in data science requires a deep understanding of the audience's background, technical knowledge, and interests. Adjusting the story to the audience ensures that the message is relevant and understandable. Data scientists must be mindful of the audience's potential skepticism and address it by incorporating evidence and clarifying any misconceptions. Gert emphasized the importance of knowing "what the audience needs to hear, not what you want to tell them," highlighting the need to align communication with the audience's expectations and level of expertise.
Data visualization is a powerful tool in storytelling but must be used judiciously to avoid misinterpretation. Simple, clear, and purposeful visualizations are more effective than complex or flashy ones. The webinar warned against using density or surface-based graphs like pie charts, which can confuse due to their radial nature. Instead, focus on clarity and the core message, employing techniques like highlighting key data points and ensuring adequate white space for better readability. Bhavya Dvivedi advised, "Try to keep it simple," emphasizing simplicity in visual communication to enhance understanding and retention.
VP of Media at DataCamp
Machine Learning Lead at Deloitte
Data Scientist at Deloitte
white paper
webinar
webinar
webinar
webinar
webinar
Companies using DataCamp achieve course completion rates 6X higher than traditional online course providers
Learn More