Skip to main content

How Data Science Enables Better Decisions at Merck

Suman Giri shares how Merck is using data to improve organizational decision-making, medical research outcomes, and how data science is transforming the pharmaceutical industry at scale.
Jul 2022  · 39 min read

Adel Nehme, the host of DataFramed, the DataCamp podcast, recently interviewed Suman Giri, Global Head of Data Science of the Human Health Division at Merck.

Introducing Suman Giri

Adel: Hello everyone. This is Adel data science evangelist and educator at DataCamp. A few episodes back we had Curren Katz on the podcast discuss how data science is transforming healthcare. A lot of the themes that emerge in the episode is that while incredible gains are happening on the research side, there are many ways data science is moving the needle and improving health outcomes today.


This is just as much the case in pharmaceuticals today and this is why I'm so excited to chat with Suman Giri on today's podcast. Suman Giri is the global head of data science of the human health division at Merck. He's held various data leadership roles throughout his career and has a Ph.D. in Advance Infrastructure systems from Carnegie Mellon University and throughout the episode displayed incredible insights when it comes to the state of data science and pharmaceuticals.


Throughout our chat, we talked about how data science is transforming the pharmaceuticals industry today, the main data science challenges facing pharma organizations, data interoperability, and data ethics, how to approach data culture, the right skills for data teams and more. If you enjoy this episode, make sure to rate, subscribe, and comment, but only if you liked it. Now let's dive right in.


Suman it’s great to have you on the show.


Suman: Thanks for having me.

Adel: I am excited to talk to you about data science and machine learning and pharmaceuticals, your experience leading data science at Merck, and more. But before that I'd love to learn about your background and what got you into the data space.

Suman: Yeah, sure. So my name is Suman Giri. I head Data science at Merck Human Health. Merck, as you know, is one the world's largest pharmaceutical companies. We have world-leading products in the oncology, vaccines, and cardiovascular space. So in my role at Merck, I'm responsible for all of the commercial analytics, data science and ML ops that takes place for our early-stage, advanced-stage, and inline products.

And my background is in mathematics and engineering. I did my PhD from Carnegie Mellon, where I researched machine learning algorithms in the energy efficiency space. And since then, most of my career has been spent in different healthcare-related companies. So I work in payer organizations, payer providers, health-tech, and as of a year ago I started role my new role in Merck. I landed in data science just by virtue of education which is what i studied so it was easy for me to just do that a profession. 

I landed in healthcare a little bit by accident. One of the areas I was curious about, and somehow one thing led to the other, and I was like fully immersed in the healthcare experience.

The Current State of Data in Pharma & Healthcare

Adel: So to start our conversation, given your experience as a data leader in pharmaceuticals and healthcare, I'd love to understand the current state of data science and machine learning, and pharmaceuticals. Arguably pharmaceutical and healthcare overall is the most exciting space for data science because of the potential value that data science and machine learning applications can provide in this space. 

Given your experience as a data leader in pharmaceuticals, I'd love to understand how you would describe the landscape of data science in pharmaceuticals. How it looks like today and how it has evolved over the past few years?

Suman: I think the major fields within pharma where data science gets used are drug development and discovery, diagnostics, clinical trials, and manufacturing and supply chain and commercial and regulatory processes. So these are the major areas, but to give you a sense of where the impact is, we can start with COVID like, especially around mRNA vaccine.

There was a strong role that AI played in accelerating the discovery and deployment of cold vaccines. Then there was the alpha fold announcement this past year from deep mind, which basically solved the problem of protein folding and is going to accelerate drug discovery significantly. We're also seeing some exciting use cases in the clinical operations like the selection of patients for clinical trials, and compound screening to test in preclinical trials.

And then there's a commercial space, which is where I sit, where we're seeing a lot of advanced machine learning being applied for effective engagement and commercial marketing for inline products. Finally, there are also some neat applications and potential on the reimbursement side with partnerships, with payers for value-based outcomes and similar things.

So a lot of interesting things are happening in the space.

How is data science pushing the envelope in Pharmaceuticals?

Adel: And to get a sense, there seems to be developments within the research space that you see today, like alpha fold, as you mentioned, a lot of the different innovations that you see, but also developments on the commercial side and applications of machine learning and data science. 

What are some of the main areas of value you've seen in data science and machine learning pushing the envelope forward for organizations working in the pharmaceutical space today?

Suman: Yeah, this is a tough one for me to pick, because I think the envelope is being pushed in all directions. Everywhere you look people are doing amazing stuff, using applications of data science and machine learning. But if I had to pick one I would go back to the drug trials space, beause that has been eye-opening for me as someone relatively new to pharma as the implications to patient safety are huge. 

So, in drug trials, we are seeing intelligent patient selection based on multiple data sources, and more targeted criteria, like sometimes even using biomarkers or genetic information. We're also seeing automation of a bunch of preclinical quality control steps using AI and machine learning, which again we saw during the COVID vaccines. We're also seeing applications of the internet of things/IoT and real-time patient monitoring for patients in active trials, which helps avoid adverse events proactively.

This area of event adjudication, as it's called in clinical trials, can significantly reduce the time to market for drug. So it has huge implications from just our innovation standpoint. There's a lot of interesting work around simulations on pretrial compounds, using data from molecules and effects to understand adverse implications before the molecule even reaches trial.

Right? So this enhances patient safety, significantly. And finally, I'll be remiss If I didn't mention how quickly the pharma industry was able to manufacture and distribute the vaccines for COVID. And a lot of that logistics in supply was enabled by data science and machine learning. Uh, so I think this is the space that I would pick to answer your question. Just because you forced me to pick one, cause I could just as easily go on about the other areas where something like this is happening as well.

The Role of Data Science beyond drug discovery in Pharma

Adel: That's really awesome. And I definitely want to expand into the research areas you discussed here, especially with COVID vaccine. That's a fascinating topic to expand upon, but interestingly, following up on your last point here around supply chain optimization and the kind of innovation there.

When we hear a lot of use cases of data science and machine learning and pharmaceuticals in the media, especially in popular discourse, we always talk about drug discovery. These awe-inspiring use cases are related to drug discovery algorithms, right. But there are also a lot of operational aspects where data science can have a lot of value and accelerate the value for patients and improve patient quality of life.

You mentioned here supply chain and clinical trials operations. Do you mind expanding on that area? Dig deeper in to these use cases, and your experience? How you've seen that value manifest for patients?

Suman: So you're absolutely right. Most people, when they hear, let's say, machine learning and pharma, think of drug discovery. Maybe one day, we'll get to a stage where algorithms can predict the efficacy of a molecule in humans without having to go to trials. But until then, a lot happens in between where algorithms come both to enhance efficiency and productivity. So I already talked about clinical trials and how machine learning enhances patient safety.

Data science also plays a role in understanding the diseases we want to treat. Right. That's the first step, right? There's a huge play there. The prevalence of such diseases, just to ensure a financially viable model for drug development and distribution. There is a strong role in data science space in creating personalized medicine strategies and accelerating the way we design and developnew drugs. 

For inline products in the commercial space, there's a lot of sophisticated data science today, especially as it pertains to forecasting, calculating the effect of promotional marketing, reviewing promotional content for compliance. And then competitive threat analysis is a huge one, right? From a commercial standpoint, to understand, let's say that the sales and marketing effort should be focused and just creating personalized engagement strategies for healthcare professionals to make them aware of the drug and its benefits.

Then there are also use cases like AI-driven, planograms that help improve productivity, automated data, and matching promotional response modeling. Rare disease, patient finding, et cetera, where machine learning is heavily leveraged. So basically, that's a quick preview of areas apart from just right out clinical drug discovery, where machine learning plays an important role.

How is Machine Learning operationalized in the Pharmaceutical Industry?

Adel: And in terms of operationalization given, the machine learning research area is very much still relatively in the research phase. A lot of this is still in some sense ideation Do you see that a lot of these use cases are actually operationalized today within the pharmaceutical space and they're actually delivering value for pharma companies today.

Suman: That's a great question. So this is a framework that I look at it with, right? So there are companies where data science is the core product. Uber is an example. Yes, it's an app, but everything you do in it has been facilitated by some algorithm that works on data, which inevitably becomes the product that we use. So in those companies, this relentless push for maybe productization comes into play. Then there's a second-tier of companies, maybe where you are dealing with real time data. So maybe companies like Walmart or Target where data is coming in every second. And somehow, you need to make intelligent real decisions in real-time.

Right? And then there's a third category of companies, which I think is where we sit, which is where data science is not part of your core product, but it's decision support. Right. So our core product is obviously like the medical drugs that we manufacture, but design discovery, ideation, and distribution of it are enabled by data science.

Now in, in this context and in this framework, there are maybe models that don't necessarily meet the criteria of full blown let's say production, but they're just, let's say dashboards that have some intelligent component to it that is helping somebody make a quick decision. Or maybe there are questions that somebody has that AI helps them or machine learning helps them get to some sort of strategy around it. What I'm trying to say is I think these kinds models and algorithms also have a place, even though they might not be in what would be considered, let's say, production, in the traditional tech sense. At pharma, I've seen a lot of this gets leveraged primarily because we have the luxury of making decisions in batch mode, right? Like we don't have to make decisions in real time all the time.

But having said that there's obviously a class of models, especially in the commercial space where perhaps the considerations of safety and efficacy are a little bit nuanced. That's where a lot of the models are in production. So areas like next best action, right? Where we are enabling sales reps and marketers to come up with the optimal engagement strategy.

These are models that I've seen full production mode using pretty sophisticated architecture. And I'm sure there are parallels around media mix modeling and then Salesforce optimization, et cetera. So there are other models that are in production as well. So it's a good mix. Models that are maybe slightly ad hoc or one time in nature versus model that are in full blown.

What are the biggest Data Science challenges in Pharma?

Adel: I love how you cross-section the nature of the product you serve with a degree of operationalization needed for that segueing here, I'd love to deep dive much more into the challenges of working in data science and machine learning in the pharmaceutical space. What would you say are the biggest challenges specific to data science in the pharma industry?

Suman: So the biggest challenge is around data, which worsens as you move outside of the US. Cause we are a global company, we operate in more than 60 countries, and data gets increasingly sparse and and hard to access as you go outside of the US. And without data, you have no way to identify the prevalence of a disease. No way to know whether a molecule is financially viable. And no way to even market it effectively. 

Today, the pharma industry relies on a lot of syndicated data, but the lack of ability to bring simple data sources like claims and EHR, right as is a simple example, holds us back, forget about biomarker and genetic information. Like that's a whole new level of complexity where we are struggling to do even the basic things over the past few years, there's been some interesting innovation in this. Companies like Data Van are trying to bridge this gap, but I think it's still early days in this space. I believe there's a real opportunity here for solutions in data identification and synthetic data generation and federated learning to accelerate data science and pharma significantly.

Having said that, I do think that the regulatory infrastructure needs to evolve in tandem as well to allow for this kind of innovation. Obviously, since we're talking about challenges from pharma side, we also have a hard time recruiting the right talent, which is associated with data science and pharma. I think there's also a third challenge, which is mindset, right?

So data science by definition has the word science in it. So it requires a little bit of cultural shift in how you think about our processes. And then this is true for healthcare in general, I would say, right. I found it difficult to do effective change management with the consumers of data science. So just to maybe quickly summarize, I think data, talent and culture is how I would describe that the bigger challenges, uh, in data science and pharma.

What needs to change to unlock the next level of Data Science in Pharma?

Adel: That's perfect. So let's expand on these one on one, you know, when thinking about some of these obstacles, let's say, for example, data, right? Collection, interoperability, collection access. What needs to change  so that data science innovation here in pharmaceuticals accelerates? Is it regulatory innovation, as you mentioned as a key part of it, industry centers that need to evolve, what do you think? What needs to be unlocked here to be able to push the envelope forward when it comes to data?

Suman: So great question again, I obviously talk about data access and interoperability. It's an active area of, I think innovation in pharma is how I would characterize it. Cause everybody knows it's a problem. And everybody knows that that's where the bottleneck lies, but I've seen a few major efforts in this field that I personally find excited by.

Right? So pharma companies are now beginning to collaborate on sharing anonymized clinical trial information. Some foreign companies have platforms where researchers can go in and submit their molecules. And then there are algorithms that scores them on their potential and in the interest of maybe decentralized data sharing and then collaboration.

So basically, we need more of this title collaboration between companies CROs, which is the clinical research organizations, academia and the government. I think the regulation and innovation are obviously two opposing forces by design. So there will always be push and pull. And issues like data privacy are extremely important.

But I think there's still a wide gap between what we should be able to do to improve lives and what we are able to do today just cause our regulatory infrastructure hasn't caught up. So there definitely needs to be a case for a close examination of what are the hurdles from a regulatory side that are preventing us from doing what we supposedly should be able to do.

And there's probably some startups in this field that we'll see, or maybe some changes in this field that we'll see prop up in the next few years. And again, from a data side, I mentioned data vent, but there's probably a bunch of other space here that can be taken by innovative companies who can enable easier access to data, not just pharma data, but also let's say social determinants of health and publicly available data that can also guide sound decision making.

Especially an area or, or timeframe like right now where there's a bunch of environmental factors, right? Like COVID is still a thing. There is like geopolitical considerations today with all of the wars going on, et cetera. So all of these data will somehow inform some sort of strategy. And I think just having some way to access that in a easier fashion, so that research and innovation can take place is going to be key.

Minimizing the harmful outcomes of AI in Pharma

Adel: You mentioned here that when it comes to data privacy and the applications of data science in pharmaceutical and healthcare, a major obstacle to data science and healthcare is bias and ethical use of AI. I'd love how you can evaluate the risk of harmful outcomes of machine learning and AI in pharmaceuticals. And how you minimize it, especially when having this regulatory discussion to create that data access.

Suman: So this is something I spent a lot of my time thinking about, right? So I'm maybe just wanna quickly share three examples with you that I learned about recently that I've been thinking about a lot, and this is all data science work related to COVID. And the reason I'm sharing this is just to highlight how big of an issue this is and how underreported and underthought this area is.

So number one for COVID, a group of researchers used chest scans of children who did not have COVID as examples of non-COVID cases. And their intent was probably to identify COVID using chest scans. What the algorithm learned was how to identify children from adults and not COVID, but these are models that made it to the publication stage.

So like, because there was no framework for like ethical use or bias measurement in place. This was able to sift through the cracks. So to speak. There was another research where they used chest scans taken while patients were lying down and while they were standing. Right now, patients lying down are more likely to be sick. So what the AI in turn learned was to predict risk from their position and not their actual risk. So again, another example where maybe the intention was right, but because the framework wasn't there to look through the downstream consequences, ended up doing the wrong thing. 

And then the third example I'll share with you is an example of an algorithm that was found to pick up on the text font that certain hospitals used to label the scans. Right? Cause they were probably doing OCR and then they were doing a bunch of things around it, like recognition and whatnot. But at the end of the day, hospitals with more serious caseloads and the fonts associated with them became predictors of COVID risk and not actual COVID risk. So these are kinda three examples that just elucidate, like how there are real issues with relying exclusively on algorithms without considering the biases and data.

Right. So to mitigate this at Merck, we closely tie ourselves to what we call the good governance framework. Before we push a model into production, we check for explainability, fairness, robustness, transparency, and privacy. Which we believe to be the major pillars of ethical use of data and algorithms. In tactical terms, we have a de-biasing layer that gets applied throughout the model life cycle from data to model to it's model.

So we're not causing any inadvertent consequences, but this is obviously not the final form of it. We have an ongoing partnership with Carnegie Mellon University, where we continue researching ways to understand the downstream implications of heterogeneity in our data and models. So all of this is just something we think about very seriously.

And we are continuing to iterate our approach to ensure that we don't end up being the fourth example on this list that I just shared with you.

How do you remove bias from Data?

Adel: That's really awesome. And it really elucidates how a lot of this research and a lot of these applications that are exposed to have this bias and these issues stem from a great place, right?

This is a use case of great intention, but can be very harmful downstream. If a lot of the bias in your data is actually bias that comes from gender or racial attributes or any of that type of demographic data. And I'd love to unpack even further. 

What do you think needs to change on the data pre-processing side and kind of the data collection side to be able to unbias a lot of this data?

Suman: So there's always going to be bias in the data as long as there is some sort of, let's say variance in your data, there's always going to be bias. So bias is just part of the experience. If you will. Now I think the right way to do this is to think about this from the get-go, right?

Like what are the implications of set bias and what are frameworks for us to go out and measure it should be part of a data scientist toolkit from the get-go. A lot of the times, the data that we deal with can has already been collected. So we don't get that voice in the input stage. So we're working with, let's say third-party data or syndicated data that we purchase.

So we have very limited input. How it gets collected. But that doesn't mean that once we have it, we don't get to evaluate it for like inherent heterogeneity in the system and what the downstream implications could be. Right. 

So I think part of like, it's a new field that it's not part of our vocabulary even like most data scientists haven't taken this class, or maybe even heard about this as part of their education. So just education and a solid framework, I think is the way to solve this and just constant iteration. Right? Like I think this experimentation is part of data science. I think that's what makes it a science. So just fully kinda experimenting and understanding, what are the implications of a model that in production and then touched certain lives and what is the bias that is inherently built into the whole system, I think needs to be an ongoing conversation. I mean, to answer your question, I don't think I have a good answer for what needs to change in the data collection side. Apart from once the data is collected, like people should and just pushing into model to modeling exercise. There needs to be a pause and think before pushing it in to feature engineering and modelling framework.

The ideal Talent Profile for a Data Scientist in Pharmaceuticals

Adel: That's really great. And circling back to the other challenges, you mentioned around data science and pharmaceuticals. I'd love to come back to that talent component. And you mentioned here education. So what has been the most challenging aspect of finding the right talent within data science and pharmaceuticals and what does a great talent profile look like within data science and pharmaceutical?

Suman: So it's a two-part question, right? So let me answer the second one. Cause that's the easy one, which is what a great talent profile looks like for, let's say, data science in the pharma industry. So I think the biggest asset a data scientists can have is good problem-solving skills, right? Like forget about data science or the technical aspects. A lot of the times what I find the true value of data scientists. Again, the third category of companies I mentioned earlier, where data science is primarily used as a decision support tool, is to understand the context in which the decision is being made. Right. And then formulate that into some sort of framework that can be maybe improved upon by the use of an algorithm or use of some sort of intelligent automation.Right. 

So I think problem-solving is a key component to a high-performing data scientist. And then there are aspects of collaboration. Cause usually, data science doesn't happen in a vacuum, right? It's not a backend job. So to speak, you have to be continuously iterating with your stakeholders, pushing back on things that don't make sense and maybe giving in and in certain things that are just required to drive things forward.

So just that level. Collaboration and communication, I think is, is a second key component. And third is, I would say, the foundational aspects of data science and machine learning, right? Like things like bias and variance, like things like the assumptions behind linear regression. Cause the problem I see in today's talent is they're so enamored by the fancy stuff.

Let's say federated learning or deep learning or this or that. Because they overlook the fundamentals. And that's again, another thing that further perpetuates the biases that we have. What we look for is somebody who has the fundamentals down cause the days of a data scientist that just imports like X model and then just applies it is increasingly numbered, right? 

Especially with AutoML and just the ease of use of certain tools I think the true differentiator is going to be a data scientist who can frame a business problem in a context that makes sense and drives value, and can execute in a collaborative fashion. 

Now the fourth thing we look for, uh, and this is not true for all the data scientists, but depending on our need, we look for somebody heavy on the ops side, right?

So the ML ops side. So again, as I said, model building is increasingly productized skill, right? Like today, you could just go and get a data robot, a driverless AI, or a data IQ. And like they will run through all of the models for you be a thousand different features for you. And it's probably going to be bettering a few cases than what data scientists can do with the limited set of experiments they can run.

But where a data scientist will be needed is to take that model and put it into some sort that makes sense for the business, right? So this will include components like model governance. Like, are you checking for drifting your right? Is this integrated into whatever APIs, the end-user platforms?

And then like, does this have components of C built into this? So these are the things that I think are. Maybe best practices from the software engineering slash DevOps world that are slowly transitioning over to the machine learning site will be an increasingly rare skill. So we filter for that and look to hire data scientists these days, uh, again, to summarize a high-performing data scientist for us - Good problem-solving. Good collaboration and communication, good foundational skills. And especially as it pertains to statistics and machine learning, class engineering component to their skill set, at least the aptitude to pick it up is what we look for. Now, I know you had a question before this today, you see, right. Like this is the age of the great resignation. So obviously, there is a lot of talent mobility. The biggest challenge I see is just a career pathway for data scientists, right? Like where they can feel like they are being productive. They have autonomy, and they have a sense of community. I think creating that environment is the biggest challenge.

Not a lot of companies do this natively. And at Merck, we're trying to solve this by just creating a separate rule, just for data science community leaders, you know, where they will be in charge of upskilling pathways and talent, growth pathways, and a sense of community where they said they can learn and grow.

But again, it's an experiment we have in progress, and it is a challenge to retain high performers, just because it's a relatively new field, especially in healthcare. So just creating that career mobility and growth pathways is an ongoing C.

The evolution of a new role- Machine learning Ops Engineer

Adel: That's awesome. And I thank you so much for this really holistic answer And harping on the op skills for a data scientist. Do you think that a standard data scientist will need the op skills in the future? Or do you think that a new role will emerge machine learning engineer, machine learning ops engineer? Or do you see the data science role being general the learning extent, or do you see it specializing more and more over time?

Suman: Again, I think it goes back to the industry, right? So if we limit ourselves to the three kinds of industries, this answer will differ based on what industry you're talking about. If data science is a core component, you're basically a product, then there needs to be a strong ops component. Right?

So I think in those kinds of settings, you will increasingly find that your data scientist profile closely resembles ML engineering profile. And that's probably true for the second category of companies as well, where data is not their core product or data science is not their core product. Still, they have to make decisions in real-time. Cause a lot of the decisions need to be integrated into their systems. I think it's the third profile of companies where it's primarily a decision support tool. Where there will still be room for statisticians and data scientists who can inform, let's say decision-making without necessarily having to go into full ops mode.

But that said, I think, will get increasingly smaller with time. Neglecting one large piece of kinda data scientists, which is people in research roles within organizations, right? Like Google labs or maybe Facebook labs, where perhaps there is still room for folks that are not ops heavy but fundamentally want to focus on theoretical algorithms. But again, those are what I would consider an increasingly shrinking profile.

Practical Examples that make you tick as a Data Leader

Adel: That's really great. I'd love to pivot to also discuss your work leading data at Merck as a data and AI leader. What are some of the exciting use cases you've seen or worked on at me that really excited you as a data leader? Yeah.

Suman: So when I was not at Merck and when I was reading about Merck and considering it as a potential place of employment, right? So there were some publicly available kinda use cases that I ran across that had me really excited. Right. So there was a lot of work done in continuous drug manufacturing. So we were basically revamping how we do manufacturing within our branches to facilitate, let's say intelligent automation and continuous drug manufacturing. 

There's a lot of work that we were doing in our supply chain and logistics as well, that involved data science and machine learning. So that was exciting too. And then in the commercial side, we did a lot of work in intelligent and effective engagement.

Right. So how do you figure out what the right message, channel, content, and cadence is to engage your customers so that they see the value and benefit of the life-saving products we generate. That's where a of machine and data science comes in because it is fundamentally an intractable problem.

If you try to do it by brute force, right? So somehow, you need to have this predictive and intelligent component. So I think these are some use cases. I was aware of even before I joined and as I have entered this. There's a lot of interesting work happening in, let's say natural language processing to look at, let's say the promotional content that we send out and the engagement that happens with that to understand what is it that is resonating in, in our messages and what is not resonating so effectively.

Right? So that we can be more curated in our engagement efforts. So that's a huge area of focus for us. There are other areas around, let's say. NextGen and advanced predictive modeling and forecasting to understand, like, what are the implications of certain decisions that we make today five years down the line.

So those kinds of interesting work is also happening in the commercial space, which I find extremely exciting.

Priorities for Data Science Change Management

Adel: That’s really awesome. And, you know, as a data leader, one of the challenges you mentioned is change management data culture. You're not only tasked with operationalizing data science use case that have an impact on the short term, but you're also focused on long term transformational projects, like change management, enabling data culture, and even research and development to drive long term use cases.

How do you balance these different priorities and these initiatives and how do you allocate your team's time and resources?

Suman: So that’s a loaded question. So I think this balance between business as usual and innovation, right, is something we continue to strive for. Like, I'll share with you an interesting experiment we're doing. And I keep using the word experiment within our org structure, cause we are constantly undergoing transformations so we have strong leaders who believe in being agile and adaptable. So there's a bunch of experiments that we have already in flight, both from a ways of working and culture perspective. So that's why I keep on going back to the term experiment. So we today had this org structure where most of our data scientists sit in a flex pool. As a result, they get to work on different kinds of projects. So they're not tied to one thing that they do. And throughout the year they can work on, let's say one franchise or one business unit or one type of problem.

Right? So that helps maintain a good balance between innovation and execution. We also have a dedicated kind of research and innovation function within my team, and they focus exclusively on external partnerships, branding innovation, and best practice. Right. And so they provide that extra bandwidth for innovation, if you will, at that kinda permeates throughout the organization.

And then within the larger like CDO organization, we have, as I alluded to earlier, data science and community champions, right? Like those are dedicated roles that we have who are constantly iterating on our culture through events and activities. Create that sense of community. We follow constructs like objectives and key results, like OKRs, to track our goals.

And then there's this concept that I think Paul Graham from a Y Combinator pioneered, which is maker time versus manager time, it's very easy as a data scientist to be in meetings all day, because you're just a glue that connects everything.

So everybody would want data scientist in their meeting, which we call let's a loosely manager time. So how do you kind of find the right balance between heads down problem solving mode, which is maker time versus, let's say, more managing people or managing stakeholders time and loosely we try to strive for maybe a 60, 40, where 60% of our time we spend on problem solving and 40 we spend on let's say, so the quote unquote managerial stuff as a group.

So obviously, this looks different from individual to individual, but like let's loose compass that we'd drive towards. So these are some guard rails that help us point in the right direction.

Driving the needle on Data Culture

Adel: That's really great. And I love this analogy from Paul Graham. I think that's a really great way to think about it. Now we mentioned as well, data culture here and change management. How do you view the importance and the challenge of data culture when enabling the adoption of the solutions, you create and what have been some of the ways you've been able to move the needle on data culture?

Suman: Data culture is huge, right? So, I mean, this is what I refer to as change management because there's two kinds of culture, right? Culture that is for data scientists. So that obviously needs to be there so that we can be happy, productive, and retain them. And the major component there for effective data culture is autonomy.

Like a sense of autonomy in their work. A sense of community, right? So that they feel like they're part of something larger and just a sense of growth. Right? So they feel like they are learning either from a technical side or a domain side and improving consistently. So I think those are three key pillars we anchor around, but there's a different side of data culture, which is data culture in the organization.

Right? So data-driven mindset, this is commonly referred to as, and unless we are asking the right questions, we will never be working on the right problem. Right. So to solve this at Merck, what we have is basically an analytics translator role, you know, so within the larger organization, which is that the CDO organization, we have dedicated data professionals the right way to think about them is maybe they have a major in data science and a minor in business.

Right. So they will sit very closely with our business stakeholders and they act as thought partners, you know? So every time there is a question, there's a certain process that they follow, right? For instance, what is the action that this question is gonna drive? Right? Question? The answer is a hundred, or let's say the answer is zero.

Like, how does your decision-making change? Is this like an ad hoc thing? Or is there a larger problem that you're trying to get to, which has more of a predictive or, or a product type component to it? So just kinda having that level of dialog. Over time, I think is going to generate maybe more advanced data-driven thinking and a change in data culture.

So again, another experiment that we have in progress, but we take this aspect of culture very seriously.

Adel: That's really great. And I completely agree with you. I think having someone in the room that can speak both languages, the business language and the data science language will elevate everyone's skill set, whether that's the business folks getting more of a data language or the data folks getting more of a commercial acumen. Now, Suman, as we close out, I'd love to look into the future and see what you think are the data trends and innovations that you're particularly looking forward to seeing within the pharmaceutical and healthcare space in the next few.

Suman: Yeah. So maybe I'll limit it to three things cuz I am looking forward to a lot of things like everybody else. I'm looking to see how alpha fold and the major announcement last year is gonna accelerate drug discovery. Right? So that I'll be watching very closely. 

Second I think there's a lot of work being done on NLP and conversational AI outside of healthcare. Right. And I'm looking very closely to see how that translates into the pharmaceutical industry. Cause we could definitely use some advance. Kind of methods on structured data that we deal with regularly.

And third is a little bit out there, but it's a space I'm watching closely around, let's say, web 3.0 and blockchains and how it will affect marketing. Especially this concept of let's say privacy, first, first-party data where end-users have control over their data. And there's no kinda middleman in the between.

So there's no Google Facebook that is kinda trying track you. And like companies, like let's say, Merck will have direct access to your data with your consent, and you get to monetize. It will change the commercial landscape fundamentally. Right? Cause today the data we have is reliant on what, let's say data aggregators provide. Tomorrow will be all first-party data or data you provide to us directly. And this concept of linking across multiple, let's say of your web experiences is going to be easy because it's just one ID you have throughout your web three experience. So very early days for web three, obviously it's still kinda in the the ideation phase in, in many ways, but it's space I'm watching very closely because that's gonna have huge implications on how we do sales and marketing in the future.

Call to Action

Adel: That's really awesome. And really exciting. Now Suman, do you have any final call to action for today's listeners as we close out?

Suman: First of all, stay safe. I know it doesn't feel like COVID is still around, but it is. So continue following guidelines. Second. I would just say follow the life sciences data science space very closely because most of the disruptions will happen in this space.

Very. And then third, I will say we are hiring. So if you are looking for opportunities, please feel free to reach out.

Adel: That's awesome. Thank you so much, Suman, for coming on the podcast.

Suman: Of course. Thank you for having me. I had a great time.

Related
Data Science Concept Vector Image

How to Become a Data Scientist in 8 Steps

Find out everything you need to know about becoming a data scientist, and find out whether it’s the right career for you!
Jose Jorge Rodriguez Salgado's photo

Jose Jorge Rodriguez Salgado

12 min

How to Become a Data Analyst in 2023: 5 Steps to Start Your Career

Learn how to become a data analyst and discover everything you need to know about launching your career, including the skills you need and how to learn them.
Elena Kosourova 's photo

Elena Kosourova

18 min

Sports Analytics: How Different Sports Use Data Analytics

Discover how sports analytics works and how different sports use data to provide meaningful insights. Plus, discover what it takes to become a sports data analyst.
Kurtis Pykes 's photo

Kurtis Pykes

13 min

Top Machine Learning Use-Cases and Algorithms

Machine learning is arguably responsible for data science and artificial intelligence’s most prominent and visible use cases. In this article, learn about machine learning, some of its prominent use cases and algorithms, and how you can get started.
Vidhi Chugh's photo

Vidhi Chugh

15 min

Inside the Generative AI Revolution

Martin Musiol talks about the state of generative AI today, privacy and intellectual property concerns, the strongest use cases for generative AI, and what the future holds.

Adel Nehme's photo

Adel Nehme

32 min

A Complete Guide to Data Augmentation

Learn about data augmentation techniques, applications, and tools with a TensorFlow and Keras tutorial.
Abid Ali Awan's photo

Abid Ali Awan

15 min

See MoreSee More