Anjali Samani is the Director of Data Science & Data Intelligence at Salesforce, joins the show to discuss what it takes to become a mature data organization and how to build an impactful, diverse data team. As a data leader with over 15 years of experience, Anjali is an expert at assessing and deriving maximum value out of data, implementing long-term and short-term strategies that directly enable positive business outcomes
Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.
Both rigor and speed are necessary in data science processes in order to derive full value from all initiatives.
Accurately determining the ROI on new data science initiatives before they begin can prevent catastrophic losses in revenue, growth, and time.
The hallmark of a mature data science organization is that it can consistently, sustainably, and efficiently derive value from its data.
All else being equal, you want both rigor and speed. The business always needs speed. They need everything yesterday because time is money. Since the clock is always ticking, if you have the right processes in place, then it becomes very easy to work quickly. So it it's a little bit like when you're learning to code. Initially, it's really hard because you have all these rules to follow and you have to think differently, but once you develop a habit of ensuring that your code is well tested and your analyses or models are thoroughly reviewed, then everything becomes simpler.
It’s necessary to calculate the ROI on investing in data science initiatives to determine if they are really worth that investment. When calculating the cost of even something as simple as improvement on an existing initiative, you may discover that the initiative’s outcome doesn't actually offset the investment that's required, or it may offset it over a much longer period than what you expected. For example, let’s say you have a model with 80% accuracy and stakeholders are asking to increase the accuracy to 85% because it will increase sales. Initially, that may sound like a great idea, but depending on the model’s complexity, that 5% improvement might actually take an entire data team six months to accomplish. So, knowing this kind of information is vital in allocating resources wisely and effectively.
Adel Nehme: Hello, everyone. This is Adele, data science educator, and evangelist at Data Camp. One of the things we always think about on the podcast is how to make a data team impactful, where data teams go beyond hype and promises they cannot keep. Become a strategic asset that accelerates the organization's ability to provide value. To do this, data teams must balance rigor, business impact relationships, and more.
And this is why I'm so excited to have Anjali Samani on today's podcast. Anjali is the Director of Data Science and Data Intelligence at Salesforce. She's a Senior Data Science leader with over 15 years of experience in multinational corporations, startups, and public sector organizations in the US, and she's led her fair share of impactful data teams. And she brought it in spades. In today's episode, throughout our chat, we talk about how she defines an impactful data team, how to align data science projects with business value, balancing rigor with speed as a data scientist, the importance of data, culture, how to manage stakeholders and much, much more.
If you enjoyed this episode, make sure to like, comment, and subscribe, but only if you liked it now onto today's episode. Anjali, it's great to have you on the show.
Anjali Samani: Thank you for having me, Adele. It's great to be here.
Adel Nehme: I'm really excited to speak with you about your experience leading data teams, how you define a mature data science organization and how to build an impactful, diverse team.
But before that, can you give us a bit of a b... See more
Anjali Samani: Sure. So I work at Salesforce on a team called data science applications, which is part of a broader organization called data intelligence. Data intelligence is a very diverse team comprising four sub-teams. One of which is the data science applications team. So there's a strategy and growth team, which helps the general managers of our product lines think through product strategy, figure out what kinds of metrics they should be using, and how to build out a framework to drive strategy for their business units.
Then there's data science and engineering. They take care of our data platforms and pipelines. So that data scientists can actually do their work. Then we have a visualization and enablement team who build out a lot of our interfaces for our data science products and help our users and stakeholders interact with the applications team. So that team builds out applications centered on AI and machine learning. We develop apps for our internal stakeholders at Salesforce to help them make better decisions. So it comprises product managers, data scientists and data engineers. So within that data science applications team, I lead the US data science team, which comprises data scientists and senior data scientists.
The Hallmarks of a Mature Data Science Team
Adel Nehme: That's really great. An awesome aspect of hosting this podcast is that I get to talk to data leaders such as yourself who've been really leading the way when it comes to making data science impactful within their organizations. You know, you've worked at organizations with really mature data teams, and this is especially true at Salesforce in your current role. I'd love it. If you can break down what you think are the hallmarks of a mature data science organization.
Anjali Samani: Yeah. Sure. So I think if I had to summarize it in one sentence, I would say that a mature data organization is one that can both consistently and efficiently or sustainably derive value out of its data. So when you break that down, you know why we think about consistencies that a lot of organizations are very good at spinning up initiatives to derive value from their data, but that doesn't define maturity- if you're able to do it very consistently and in a way that is both efficient and sustainable.
So what do we mean by that? Efficiency to me is about driving the costs. Down over time. A lot of the times when teams first start out in data science, you know, they're not really fully set up, they're still trying to find their feet. And if you work out the cost of generating those insights, it can sometimes be quite high because there's a lot of investment that's going into acquiring the data, setting up the tech stack, and hiring the people.
But over time as the organization matures. Then the costs should be going down and the value that they derive out of data should be going up. And then by sustainability, I mean, how is the team extracting that value, right? Is it always a fire drill or do you have good processes and technology in place so that it's running like a well-oiled machine?
So when I break that down, it typically comes down to people, processes, and technology. And by technology, I mean both data and the actual tech stack. So a mature data science organization or a mature data organization is one where data is a first-class citizen. It's not an afterthought or a byproduct of everything else.
The leadership is intentional about data. Data is a strategic asset and it is treated as such. So there's, there's a lot of thinking and intentionality that goes behind what data is collected, how it's collected, how it's processed, why it's collected, because that also impacts downstream policies, depending on the purpose that a, a data set is being collected for, it may naturally introduce certain biases or certain nuances within that data.
And if you don't understand why you are doing it and communicate that clearly to the Scientists, then you may end up with misleading insights or outcomes. So they have all these strategies in place. There is the tech stack in place and the technology investments are also very intentional, right? The data needs to be connected with the business objectives and the strategy. Which is owned by the CEO's office, but the investments in the data and technology also need to trickle down from the top. That's a sort of the technology side of things, right? So there's the data and how you collect it. A lot of times when organizations first start out, they'll start with, I have all this data.
I have lots of big data. What can I do with it? And that's a great place too. But it's also a hallmark of a very sort of naive young organization. When, when you think about it in, in data maturity terms, then there's the tech stack. So with that, there's, you know, all the technology that goes into collecting, organizing, and persisting that data, there is also technology.
That enables the people who are using the data. And that could be your expert users, like your data, scientists, and engineers, and they could also be some of the downstream consumers of those data products. So what kinds of dashboarding to invest in, how are your users able to interact with your data products? How are you enabling your data scientists to access the data and deploy models into production? So that's the technology and data side of things. Then there's the people side of things, which is always the most complicated and the most difficult one. This is about the whole culture of the organization as well. And that sort of starts at the top with the leadership, but the exact leader. But at a more localized level, it's about good hiring processes. It's about knowing what the organization needs and what good looks like within each of those roles. And those roles will also vary depending on where in the growth stage of businesses, where it's at, and in their maturity levels. As an example, when the organization is still very early on in its journey, it may require a lot of generalists. So, you know, they can hire a few people who can do a whole bunch of different things and they can really get the systems up and running very fast. They don't necessarily specialists.
But as the organization matures, then, you know, you start to see the need for folks who specialize in engineering or in data science, or even specific areas within data science. So there's an understanding of what roles are needed and how to look for the right people. Then once you have those people, it's about supporting them in the right way, providing them a clear career path, and providing them with the right mentorship so that they continue to grow and develop.
A lot of organizations today offer education stipends, which is a really great idea because data science is a field that's evolving at such a pace that, you have to constantly be learning and keeping yourself updated within that people bucket. It's all about the data literacy and, you know, having an organization where there are high levels of data literacy, not just the data scientists and engineers but the data specialists who are highly data literate. Data informs decisions. And there is this culture of challenging assumptions, views, and beliefs. If they're not supported by data, people are not so attached to views and assumptions that, you know, if data challenges it, they struggle to change their minds.
And this is a really hard thing to accomplish because it's such a big culture. This is what keeps everything running like a well-oiled machine, all three of these data processes and people are intricately connected with people at the center of all of these things. If you don't have good processes, then you may not be able to derive.
The value that you need out of your data science investments, you may have a fantastic team of data scientists who are constantly innovating and coming up with new products or new insights. But if you don't have the processes to take that innovation from the lab to production, in a way that it impacts the bottom line and gives you that competitive advantage without burning out your team, then you've really failed at deriving value out of those investments.
So the processes are around adopting a lot of the best practices within data science, engineering, within product development. It's about having that culture of experimentation, even at the enterprise level, and setting up the incentives correctly. So, it's okay to fail. As long as it's not catastrophically, you're not incentivized to always look successful, to come up with positive kinds of insights.
So the processes are about putting the right things in place, starting with the leadership, and then it trickles all the way down to your individual data scientists. So this is how I think about maturity within a data science org or a data org.
Framework for Designing Data Systems for Business Impact
Adel Nehme: That is such a great holistic answer. And there's definitely a lot to unpack here.
So I wanna really focus on the people component throughout the remainder of the episode. And you mentioned there's the data literacy aspect for the broader organization is also the specialists within the organization, the data science team. So let's start with the data science team. One thing I've seen you break down really well is the importance of always connecting the data science team's priorities to the solutions they developed with the business objectives. And this has multiple dimensions, something that you title as the three Rs of data science.
Do you mind sharing your thinking or framework on the considerations data teams should make when designing data systems that have business impact?
Anjali Samani: Yeah. I mean, in an ideal world, everything you build should have a business impact, right? Because otherwise, what's the point of doing it. But of course, that's not realistic, especially in the data science world where so much of it is, is very experimental. It's very iterative. It can be very time-consuming and there's always more you can do, you can spend forever working on a problem. So then what it comes down to is how is this thing going to be used? And what impact is it going to have on the business? Because that really determines where you land on what I call the risk versus time to ship curve. So, if you imagine a pair of axis where you have on the X axis, you have the time to ship and on the Y axis, you have the risk that the product carries.
Then it's an upward sloping line going from left to right. And in the bottom left corner, you have things that are very low risk and take very little time. And then in the top right corner. High risk AEs or data science investments that also take a long time to develop. And then you have everything in between.
If you're in the bottom left order, then it's analyses or data science work that is low cost. That is low risk, but typically low impact as well. So this could be things like engagement initiatives, right? Where it's all about volume. You're producing a new soundbite every other day. So even if you get a few things wrong, it's not very.
Then the impact of that is pretty low. It may upset a few people. It may cause a bit of an uproar on one day, but tomorrow when you release the new thing, people will forget about it. Then they'll be talking about something else. There is very little rigor that you may want to inject. I mean, in an ideal world, you want to be rigorous about all the work that you do, right?
All your analyses. You want to make them reproducible at the very least. But if this is something that time and cost considerations don't allow, then in that bottom left corner, you don't have to worry too much about it. Then you might move a little bit further along to the right. And that's where you may have your in insights that you might, might produce on some kind of a regular cadence. These take a little bit longer to produce because you need to think about the business context a little bit more. You need to do some deeper-level analyses and you may need to reproduce them every so often because you may want to check whether those insights still hold.
So there, you may want to think about the rigor a little bit more definitely and the, the reproducibility side of things, but if it's not something that. Expecting to run on run very frequently. Then maybe repeatability and replicability are not big considerations. You move a little bit further along the curve, and then you have activities that are more medium risk, medium impact.
They're more mature products, dashboards, reports that are being generated on a regular basis. And their reproducibility and repeatability become very important because it's all about efficiency. If it's something that you're doing every. It's not requiring any new thinking and the people who are working on it are not necessarily learning and growing from those activities.
But these are important because they keep the lights on. So with those things you want to automate it. You want to make sure that it's very efficient. It's a. It takes very little time to run those. So that's where you need to make a little bit more investments. So you're moving to the right on that time to ship curve.
And you're moving a little bit higher in terms of the risk. You move a little bit further along and you have a lot of mission critical activities where, you know, if, if you get things wrong, there could be huge implications in terms of maybe there's legal risk, maybe there's regulatory. Maybe there's a huge reputation, risk and brand risk that's associated with getting those things wrong.
There. You want to inject a lot more rigor and leadership needs to understand that these are some pretty big risks and pushing the team to ship very quickly when it may not be possible is maybe not the right thing to. And then when you go at the very top, you have very low volume activities and it, it could be those one off amazing products, you know, once in a while kind of products that change the way things are done in a particular industry that change the thinking in a particular industry.
And there the risks are super high, right? Those could be catastrophic risks where it could lead to loss of life, or it could lead to certain other major implications. That's how I, I think about it, your level of sophistication increases as you go up the curve. And so you need to make a lot more invest.
You need to think about, you know, how rigorous your analyses are, you know, how you're persisting these things. If there's likely to be certain regulatory audits that are going to be conducted, how are you persisting your data, your models, your analyses, all of these things. I really think about it. In those terms, and then there's also the timing aspect of it, right?
So there are times when something might take a long time and if you need to extend that deadline by a week or two weeks or whatever it is, it may just mean that the launch of the product is delayed or shipping is delayed and it's costly, but it's not. An existential risk. Almost. There are other situations where it's very binary, where if you don't release by a certain date, because there is a deadline, then you've missed that boat.
And then you have to, again, kind of balance this need for rigor. And the time to ship very finely. So it's always a balancing act and it's about taking calculated risks. It's about thinking, what are the risks? How likely is this to occur? What is the impact? Are there any mitigations that I can, I can put in place?
So that's really how I think about it. I think about it in terms of the risk and the impact versus the time and the cost and all of those.
Managing Priorities in Business
Adel Nehme: That's really great. And following up here on the rigor and how do you approach prioritization, you know, as a data leader, how do you ensure that you're baking in rigor and what you're describing here with the teams process while also maintaining business priorities and making sure that you're able to move fast and that you're able to ship features fast.
Anjali Samani: In my view, there's a certain level of rigor that I think should be injected in all activities. A lot of it comes down to process. So to my mind, if you don't have rigor in your work and analyses, then at best, you may not get enough value out of your data science initiatives. At worse, you could make an incorrect decision, which can potentially be catastrophic it puts the organization out of business. So my opinion is that rigor is very important actually to derive that value out of data science initiatives. And it's important to think about it again in terms of risks, right? So. all else being equal. You want both rigor and speed because the business always needs speed.
They need everything yesterday because time is money. And if the clock is always ticking, if you have the right processes in place, then it, it becomes very easy. So it it's a little bit like when you're learning to code, initially, it's really hard because you have all these things to follow and you have to think differently, but once it becomes a habit, To always make sure that your, your code is well tested, making sure that are reviewed.
There's a tech review. There's a code review. You have these feedback loops in place. There are checks and balances along the way. Then rigor is just a very natural part of how data science is done. I think unless you're on the very left of that curve that I just described where rigor isn't that important, then taking a very scientific approach is to my mind always the right answer.
And I think a lot of organizations. Forget that there is a really strong bias towards speed, which is understandable, but it can come at the cost of making the right decision. And I have seen organizations spend millions upon millions pursuing things that were actually based on incorrect initial analysis.
And it was because nobody reviewed it because there weren't the. Processes in place and nobody asks the right questions. And so there's always this tension between balancing the business priorities and needs with the need for rigor. And I think a lot of it comes down to asking the right questions. You need to ask things like if, if I don't inject the level of rigor.
The right level of rigor into this analysis, what are the risk? Right. So that whole risk probability of the risk, what is the impact and how can I mitigate it? And I think helping the business leaders or stakeholders, see that kind of analysis to say, Hey, I understand the need for speed, but here's what can go wrong if we don't do this right.
I think that's usually a really critical piece of building those relationships, keeping those lines of communication open, and also, you know, as a data scientist sort of knowing when it's time to go really deep and when it's time to come up for air and kind of say, okay, this is the right place to stop given the risks and given the time sensitivities of these things.
So it's, there's no short or easy answer to how you do these things. It's always a little bit of an art. It's a bit of a fine balancing.
Adel Nehme: I love that answer. And especially showcasing the risk here to business stakeholders. We really value speed, but the costs of cutting corners because of speed can be so massive here when not applying rigor.
Anjali Samani: One of the really important things in this aspect is also setting up the right incentives for data scientists, cuz if there is always this bias towards speed and. Bias towards producing what I call positive insights, rather than being able to say, Hey, this big hypothesis that we have, or this big data set that we've just invested a lot of money.
And actually there's very little value coming out of it. That takes a lot of gumption, a lot of courage, a lot of integrity to go and say that to senior leadership. And I think the incentives are often set up so that people don't do that. Those data scientists will sacrifice. Rigor for speed because there is culture in some organizations where, you know, somebody who can get the insights very quickly, they're always recognized and rewarded rather than somebody who's, who's done a more author analysis and actually comes back and says, Hey, I don't think that there's enough value in this something that's a positive insight is always far more exciting for the leadership. There's a little bit of re-educating that needs to happen on the leadership side as well, to set up the right culture and incentives.
How to Measure ROI for Data Science Initiatives
Adel Nehme: You couldn't agree more. So we'll expand on that in the data literacy section, maybe something mature data teams often do really well is the ability to be able to measure the ROI of the data team's outputs. How do you measure the value and ROI for a specific data science initiative? And how does this go into your prioritization process?
Anjali Samani: Yeah, that's a really great question. Isn’t it? It's something that's very close to my heart. So I think that data science exists within a business and like any other investment that a business might make, you need to do that analysis.
It is this investment giving me enough return. So it's it. It's one of the most important considerations in my mind when you're thinking about whether or not to invest both time and data science and technology resources. Into any kind of data science initiative. And this isn't just a new initiative that you might be starting from scratch.
It could be something like, oh, model X has, I don't know, 80% accuracy stakeholders are asking for you to increase the accuracy from 80 to 85% or 90%, because that would. Really help them to increase the sales or some other cost savings. And in principle, that sounds like a great idea because of course this is going to help us to improve the bottom line.
Why should we not do it? But going from that 80 to 85% accuracy might actually take you an entire team of data scientists. And it may take six months to, to get there because it might be a very complex problem space. But when you. Calculate the cost of even something as simple as model improvement on an existing initiative, which people often don't do.
You may realize that actually the improvement in, in sales or revenue that that model improvement will give doesn't actually offset the, the investment that's required, or it may offset it over a much longer period than what you were expecting. Measuring that ROI is super important. So. How do you do that? Right? So there's the cost of the initiative. And then there's the value you get from. The cost pieces is relatively easy because compute costs are really easy to, to get the cost of human resources is again, very easy to get how many people are working on it, how much time is being spent on it. And what is the salary of those people? Right. So that's a very simple calculation. The value piece is much harder. And that's why a lot of organizations don't really do it. They really shy away from quantifying the value that they're getting out of a, a data science investment. It's always possible to start with how the product is going to be used and tying it back to some kind of quantifiable metric or benefit.
The obvious ones are increasing sales and revenue and cost saved or how much time is being saved or some kind of other operational metric. Now, these things aren't always easy to measure. A lot of the processes day to day processes and even decision making processes. So much of it is very subtle and a lot of it is muscle memory and it's a lot of mental models getting people to stop and think about how long is it taking me versus how long it took me before I had this product?
Or how long would it take me if suddenly somebody switched the lights off on this product? These are really hard things to get people to do. And that's one of the reasons why, you know, success of data science investments really depends on that culture of the organization. And these things have to come top down.
Data scientists who's developing these things may not be able to go to a senior decision maker and. Hey, can you keep a track of how long it takes you to do certain things? You can build your, your tools in a way that it may track the journey of the user through the product, or how much time they're spending navigating through your product.
But again, this requires a lot of thoughtfulness. Ahead of time. And this is often done for external products, but not for internal products. So internally it, it's always very difficult to, to measure that value. But, you know, even if you can track things like the number of decisions that are powered by this product or this insight, or what kind of decision it is, then that to my mind, Is very valuable and more so than just sending out a survey and getting that qualitative feedback.
That's very important. Don't get me wrong, but it doesn't necessarily help you get to that ROI. A lot of it is about asking the right questions and really pushing the stakeholders on that. So an example might be okay, this tool helps me to make a better estimates of how much I'm going to sell this quarter.
Okay. Why is that important? Oh, because it may help me to understand where I need to focus. Okay. Why is that important or what happens if you don't have that information? What is the cost of focusing on the wrong areas? You have to really keep digging a lot and this isn't. Always easy or it's not always welcome either because people don't often have time, so it's a tricky one to navigate, but it's definitely possible to do it and should be done.
And then coming to the prioritization part of it, oftentimes. There is this kind of bias towards making things very sensational, right? So insights can be very interesting. They can be mind blowing and you can really sensationalize them. But if there's no actionability coming out of it, Then to my mind, it's just a fun fact and should not be prioritized over something that is maybe less sensational, but more impactful.
But again, this is a cultural thing and it requires a lot of push from the leadership to focus on the right things. In reality, a lot of other things come into play. Who's making the request. Is it coming from the CEO or is it coming from someone much further down the chain? What is the likely adoption of this product?
You know, who else in the company is it going to impact? Are those people, the people you want to get in front of, and you want to get your product in front of, so there's all these other, you know, sort of the human side of things that go into the prioritization decision. But if you take that out then to my mind, it's always about the impact and how much it's helping the business to achieve its goals and objectives.
Sensationalizing Data Insights
Adel Nehme: That's really great. And I love that last point on sensationalizing, certain insights kind of mirror how we think about data science and AI in the public as well. Like for example, breathtaking research results that are really important, you know, such as know an AI playing Go, right? I would argue that like a customer churn model is much more useful for an organization, for example, than something that as sophisticated.
Anjali Samani: Yeah. And that's whole culture of sensationalization also incentivizes people in the wrong way. Like I was saying earlier, people may want not to follow all the best practices when it comes to how you're training your model. So how do you even select the data for your training, validation, or testing.
And at one level, it can lead the organization to make suboptimal decisions, but at another level, it also sets the wrong expectations. Of what data science or AI or machine learning can do and not just within the organization, but also more broadly among the general public. And that can go two ways, right?
That can cause a lot of excitement, but it can also cause a lot of fear. And I think as a community may often do ourselves to service by sensation and cells disservice by sensationalizing the wrong things.
Adel Nehme: You see it a lot, especially today in the AGI talk with GPT 3, D2 et cetera, like these are highly great models, right?
But at the same time, we need to have tempered expectations to a certain extent on where the field is headed and how we think about AI in the future. And of course, having a high-impact data science team also means scaling and organizing data teams. I'd love to segue into what is seen is a great way to organize data teams for impact.
Some organizations have a centralized data team. Other organizations have more of an embedded model, which model did you find most effective for building high impact data teams?
Anjali Samani: It really depends on where your organization is in terms of maturity and its needs. There's no kind of one size fits all kind of a solution.
So small organizations, startups, they often start. And if the organization is small, then stay with very centralized teams that do everything. So they touch all products, all systems, and they do end-to-end development. They're very close to the customers' front and back end. And in terms of the skillset, you are looking very much for the breadth rather than the depth, because often they're not required to, or they don't really have the time to go into a lot of depth.
As the organization grows and matures, the centralized team tends to start splitting into squads or sub-teams that focus on a very particular product or area of the business. So they begin to specialize and at that point, they become embedded within a business function, and that embedded model makes a lot more sense.
So most medium to large organizations that engage in meaningful data science work will tend more towards this model. Personally, I believe that after a certain point, a hybrid model works really well. And so you want a combination of both a central data team and then a number of embedded teams. So the central data science team would own all things related.
To the actual data. So data cataloging, stewardship, some level of logging the metadata and analyzing some of that. They would own a lot of the data quality and observability initiatives, documentation, and data lineage attracting. They also do very different kind of data science work.
So they might look at things like entity resolution because they've got a lot of different data sets coming from. All over the place, internally and externally, they may be engaged in building knowledge graphs, which then these more functional teams can leverage to draw insights. They may be working on things like anomaly detection to understand what kinds of data cleaning might need to be applied.
What is the right kind of transformation to apply? And what's the right level of aggregation and things like that. And they need to be able to do all of these. At scale because they look after the entire organization's data, they maintain that inventory. They also own things like when is it time to deprecate a certain data set or what will replace it?
Or how do you persist some of these things? I. Then you have these more embedded teams and they do very different kinds of data science work they leverage or build on what these centralized teams are doing. So, you know, they would specialize in specific areas of the business or support different business areas.
It could be something like sales or marketing or finance. It could be in customer success. So for these teams, you know, they don't really need to know how their particular subset of data connects, maybe to everything. In the organization, they have a narrower focus and the business context and the specific problem that they're trying to solve might be a lot more important.
So much of their work may be a bit more tactical than strategic, which is owned by this more centralized team, which model works well for the organization. Again, it depends on the size, depends on the resources. And also if there is enough drive from the leadership. Create this more centralized organization, because that is a huge investment in terms of data, in terms of people in terms of technology.
And if you're not already moving towards that model, then there's a lot of political and people related matters that also need to be considered. But personally, I think that beyond a certain size of an organization, that is a model that, that works best.
Developing trust with Stakeholders in Data
Adel Nehme: That's really great. And I completely agree with the hybrid model approach.
We've seen it a lot as well in like really mature data teams across the spectrum. So we talked a lot throughout the episode about data team itself from the people component. I wanna expand it to outside of the data team. And I wanna start off with the interaction between the data team and the rest of the organization. Collaboration with business teams is super important because ultimately, data science teams serve the business teams themselves. What are ways you've been able to develop trust from your business partners, with the data.
Anjali Samani: It's all about relationships, right? Whenever there's people involved, it's all about the relationships you may build the most amazing, cool data science product.
But if you are, if you don't have the right relationships, if you're not speaking to the right people, it may never get adopted. And it may never see the light of day. So it's really about the relationships you build. It's about having that empathy for your users. Understanding what their pain points are and solving for those rather than doing it the other way around, which I also see a lot of teams and organizations do where they'll go and build this amazing product and then try and sell it to the customers and say, Hey, you know, you should really do this.
And they're like, we don't need it. You know what? We have works absolutely fine. We don't need this very cool product that is very complex to understand will take us a long time to onboard. And there's a lot of friction there really understanding what your users need, what their pain points are, is, is super important.
It's also about how responsive you are to their needs. Now, by that I don't mean that you give them everything you want, you know, because they may go away and read about this very sensational cool thing that somebody's done. And they say, oh, I want one of those, but that's not what they need. It's going to be a huge burden on the data scientists.
And it's going to drive very little or no value, potentially. It's about understanding what it is that they're trying to accomplish by getting that really cool thing. What is the actual job to be done? The problem to be solved and addressing those needs. And it's enabling your users to get the most value out of what you're giving them.
A lot of trust also comes from educating your stakeholders, speaking their language and not really trying to. Drown them in all the data science, jargon and all the very technical language and complexities of your models and analyses, if they're interested in that thing. And if they're technically very capable, sure. Go down that road. But if they're not, then be mindful of that, speak their language, show them how it's going to improve their processes or how it's going to make their lives better. It's about setting and managing the expectations as well. You know, don't promise them that you can solve all their problems or get a very high accuracy on your predictions.
If you're not going to be able to do that because you don't have the right data or if you don't have enough data. So, you know, really being very. Transparent about these things and saying, Hey, here's what is possible. And here's what isn't, this is what you will be able to do. And here's what you won't be able to do.
But we can give you some of these other things that can make your life better. It may not be a cool AI solution, but a dashboard might be all you need. It's about having those open and honest conversations. It's about a lot of transparency as well. You know, really helping them to understand where some of these numbers are coming from.
So, you know, a lot of explainability within, within your models and outputs say, Hey, I think this is what the forecast is. Going to be this month, or this is how much I forecast the sales are going to be this month and this, these are the main drivers behind it so that they know what's driving those numbers, but also what they can do to change them.
If you know what the drivers are, if you know what levers you can pull, then that drives a lot more actionability. And when they see those results and when they see how the product is actually impacting the business, then I think that can help you to earn a lot of that trust.
Adel Nehme: That's really great. And speaking here on the communication between both stakeholders, how important do you find data, culture, or data literacy when enabling business stakeholders to become more effective partners for the data team and, should the data team invest in the data culture of the remainder of the organization?
Anjali Samani: And data culture is everything right? You know, if there isn't enough data, culture, or literacy, then you're talking two different languages and no matter how well intentioned things may not land in the way that you hope. If you have a data literate stakeholder, then you know, they will be a far more engaged, informed partner versus somebody who was very passive.
Or overwhelmed by what you're giving them, because they're not accustomed to working in that way. They're not accustomed to making their decisions in that way. And then at the sort of other end of the spectrum is a very kind of disengaged uninterested stakeholder. They won't say, no, this is really terrible. I don't like it. I don't want it. They will also not say, yeah, this is wonderful. I'm going to use it and all that. They'll just be like, yeah, sure, whatever. And then, you know, it goes into this black. Never to see the light of day.
That's what the difference is between having a very engaged and educated or a data literate stakeholder versus not. When you have a data literate business partner, you can actually co-create, they will come to you with problems. And it creates interesting work for the data scientists and it also addresses their pain points. And there is this really great feedback loop. There is a really great virtuous circle where business is getting value & data scientists are engaged. They're doing amazing work. And those synergies are super important, with keeping everyone engaged and interested; that burden shouldn't be on the data scientists alone. They cannot change by themselves. In the culture of the organization, there needs to be a lot of executive sponsorship.
A lot of these things need to start at the top. The data scientists need to invest that time in building that data science brand, engaging with their stakeholders, and helping to educate them. But it cannot be the responsibility of those data scientists alone. It has to come from the leadership.
Call to Action
Adel Nehme : I couldn't agree more. Finally, as we close out, Anjali, do you have any final call to action before we wrap up?
Anjali Samani: Yes. So if you are a non-data professional, definitely become data literate, teach your kids how to be data literate and ask important questions of the data and the world around them. Back in the day, you could say, seeing is believing.
You can no longer say that there's deep Fs. There's all kinds of stuff. And if you're not, if you're not aware of these things, if you're not asking the right questions of the data and what the media are feeding you, it could be over-sensationalized. Then you're doing yourself and the future generations a disservice.
And if you are a data professional or a data scientist, then make sure that you are picking up all the right skills. And by that I don't mean the latest and greatest deep learning models and everything, you know? Sure. Do those things. But if your fundamentals are not in place, you are going to struggle to add value and get your basics right.
Pick up. BLE skills, because those are super important. Particularly if you want to go into roles that are not R and D kind of roles that have business impact, because if you're, if you don't have the right engineering skills, you're not going to take your ideas and models into production and drive that value. So learn what the best practices are from software engineering, product development, and data science, and really build those skills. In addition to your technical. Those would be my two calls to action.
Adel Nehme: That's awesome. Thank you so much, Anjali, for coming on data.
Anjali Samani: Of course, thank you so much for having me. It's been a lot of fun.
Google Cloud for Data Scientists: Harnessing Cloud Resources for Data Analysis
A Guide to Docker Certification: Exploring The Docker Certified Associate (DCA) Exam
Driving Data Democratization with Lilac Schoenbeck, Vice President of Strategic Initiatives at Rocket Software
Bash & zsh Shell Terminal Basics Cheat Sheet
Functional Programming vs Object-Oriented Programming in Data Analysis
A Comprehensive Introduction to Anomaly Detection