Brian Campbell is currently an Engineering Manager at Lucid Software, where he leads the companies Data Science and Engineering efforts. From helping a 3-person marketing department configure Google Analytics to being the tech lead on custom streaming-analytics solutions, he’s built up Lucid’s data ecosystem through many stages of growth. He’s grateful for the many friends he’s made in every department along the way. When not behind a keyboard, he can be found digging in his garden or at the library with his daughter.
Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.
Succeeding in data projects comes from alignment: When working on large scale data projects, it's absolutely paramount to get all your stakeholders aligned. Whether it's engineering, infrastructure, product—ensure you have common goals and roadmaps are aligned.
Try to break data silos: One of the biggest blockers when procuring the data you need at the beginning of projects are data silos. For example, marketing data tends to be centralized in a marketing stack owned by functional teams—to avoid this in the future, think about how you can centralize your data.
Prototyping is key: It's always important to start with an MVP to determine the viability of a project. Avoid trying to create the perfect project, or the perfect thing—and focus on iteration.
The most important thing I've learned from agile in getting out to data science is to start with an MVP and to build on the MVP quickly. I find a lot of data science projects fall into the waterfall trap of... We're going to create the perfect project, the perfect thing, and get that out in front of customers. When really you can start with something simple and naive and start learning about how your customers are going to interact with it.
Success comes a lot from alignment. The more that stakeholders, the people that want the problem solved, the data science team and the team that's implementing the project are aligned, the better the outcomes will be. So make sure you're all on the same page about requirements and timelines.
Adel Nehme: Hello. This is Adel Nehme from Data Camp, and welcome to DataFramed, a podcast covering all things data and its impact on organizations across the world. A key aspect of developing a successful data science team is strong data science project management. Since data science teams are consistently pushing out solutions that provide value for the entire organization, collaboration is such an important skill and competency to foster. This is why I'm so excited to have Brian Campbell on today's podcast.
Adel Nehme: Brian Campbell is currently an engineering manager at Lucid Software where he leads the company's data science and engineering efforts. From helping a three person marketing department configure Google analytics to being the tech lead on custom streaming analytics solutions, he's built up Lucid's data ecosystem throughout many stages of growth. All of that was not possible without strong collaboration.
Adel Nehme: Throughout the episode, Brian discusses his background, how data leaders can become better collaborators, data science project management best practices, the type of collaborators data teams should seek out, the latest innovations in the data engineering tooling space and more.
Adel Nehme: Brian, it's great to have you on the show. I'm excited to speak with you on successful data science project management, how data teams can ensure their maximizing value of their work but before we begin, can you give a brief background about how you got into the data space?
Brian Campbell: Yeah, sure. So I starte... See more
Brian Campbell: And I felt that I really enjoyed doing that work because I enjoyed talking to my customers. People that were affected by my work were the people that I sat next to every day and seeing their lives get better, their jobs get easier was very fulfilling. Then as the company grew from 50 to now 800 employees, we had more and more advanced data needs. We needed data warehouses. We needed a click stream analytic system and since I knew the people, I knew the problem space, I got to take over and lead and now manage that. Of course, then data scientists came into play too, and I got to work with them on their projects.
Adel Nehme: That's very exciting. You're someone who's led the successful deployment of quite a few data projects throughout your career. While no data science projects is the same, I'd love to really discuss the anatomy of a data science project. Do you find that there are common characteristics of a successful data science project within an organization?
Brian Campbell: So in my mind, success comes a lot from alignment. The more that... The stakeholders, the people that want the problem solved, the science team and the team that's bringing the project into the world, whoever's going to be implementing it so that the stakeholders can use it are going to need to understand everything. Be on the same page about requirements, be on the same page about timelines so that they can make that successful. And I found doing this that no two data science projects are the same and the quicker I realized that things are going to be different, the quicker we're able to align.
Brian Campbell: I've come into projects before where we're coming to say, "Hey, we did a project just like this a month ago. Let's just do it again." We don't really talk. We don't really do the coordination we did on the first project and then we find out something's different. The new model has an output that's too big for the message bus. The new client has a time out that's too short for the existing system and then now we're delayed by weeks trying to figure out what was different when if we had talked about what was different out of the gate, we'd be in a much better place.
Adel Nehme: I'd love to expand on that but just before... Being laser focused on value is obviously a massively important component of a data project. What do you think are the pitfalls data teams face when they're prioritizing projects that lead them to lose sight of the high value projects?
Brian Campbell: So one thing I've seen pretty often is teams starting from their data. They look at the data that they have and say, "Hey, I could probably do this cool thing with it. There's this interesting dimension that I could predict. I can do this classification." And then they do it and find out that nobody actually needed that done it. It looks cool. It feels good but it ends up not being that useful.
Adel Nehme: What do you think are some of the best practices data leaders can adopt when scoping data projects within their organization and what are some of the key considerations they should make?
Brian Campbell: So I always encourage teams to start from problems, go out to product managers, go out to your executives or directors, see what problems they're having, find what problems a data scientist can solve and then start getting your data.
Adel Nehme: So a lot of the time, the data solution that a data team is developing is a solution to a business problem as you just said, by working with the problem experts, for example. Do you find that the data solution can often evolve throughout the project life cycle, as in you start with a very specific solution in mind, like a machine learning model and then end up building something completely different but that solves the business problem at hand?
Brian Campbell: Absolutely. A lot of the problems that my team has set out to solve with an advanced technique have just ended up as a visualization or dashboard, some way to make the data more accessible to whoever the stakeholders are. They don't necessarily need these big, fancy prediction models out of the gate. They just needed to see their data in a new way, in a more accessible way. And by doing that sort of 20% of the work, you get 80% of the value, but then you can start building on it.
Brian Campbell: A great example we have at Lucid is with our anomaly detection work. Anomaly detection libraries these days have huge numbers, bells and whistles that you can use to tune your anomalies but we found that it was better to, instead of trying to get the perfect curve and finding the right anomalies... So it was easier to get something out with the default models, start seeing how people would use the anomaly detection to find which ones they cared about and which ones they didn't care about. And from there, we started doing the math to add customer aggressors, get the right seasonality.
Brian Campbell: Yeah. It turned out if we had done all that work for something people didn't care about, we were eating too many payments. It was great, but nobody needed to respond to that. We didn't need to monitor that. We just needed to know when payment suck.
Adel Nehme: Okay. That's awesome. And what do you think are some of the lessons maybe you can share or best practices that data leaders can adopt to ensure that they are always finding the best solution to a problem?
Brian Campbell: Yeah. So always start small. Always start by getting something in front of a person as quickly as possible. If there's a baseline model you can use, if there's a simple visualization you can build to try to take care of this problem, you start there, see if it helps and then go to the next bigger thing, the next more sophisticated thing. Don't try to start from your sophisticated answer, try to build towards it and validate it along the way,
Metrics for Maximizing Value
Adel Nehme: Keeping on the same theme, often times with any data science specifically machine learning project, common evaluation metrics used or metrics like accuracy or recall. However, in the real world, algorithms are evaluated not only on accuracy, but on robustness and the value they generate. So what do you think are important metrics to track throughout a data project's life cycle to ensure teams are maximizing value?
Brian Campbell: The metrics of the business are the ones that should always be at the forefront of the team. If you're working with a product team, you need to care about product metrics the most. You got to care if users are coming back to the products, people are purchasing, if they're having an enjoyable experience in the product. And then if you're working on a business issue like marketing cares about CPC, sales is going to care about closed deals, while data tends to be one or two steps away from that, that should be your north star metric and then you can build down to see what you're changing, how your accuracy affects those metrics and then while you're doing that, you can find interesting secondary metrics.
Brian Campbell: We set up a system recently where we had a document clustering problem. We'd get some data from users and we'd cluster it for them. And we built the best model we could. We had a loss function and we optimized for it and we found out that it... It took 30 seconds. We thought that was fine. We put it in front of customers. They were not willing to wait 30 seconds for their clusters to come back. They would bounce from the site before we were done.
Brian Campbell: So we had to come back through, change the algorithm and while we were not as good at having the best clusters, now I've seen a lot more users happy because they were taking just a couple seconds to get the experience they wanted.
Brian Campbell: And another interesting example I have here is from the e-commerce space. There's a team working on making product recommendations. So you're in a store. You get recommendations on Amazon all the time. So they were always trying to increase the relevance of the recommendations. They wanted people to purchase whatever they were showing them, but they found that there were some trade offs there and if they increased the number of people that would... They'd recommend pens to people, they increase the number of people by pens by 10X. That's great. Recommendations are more relevant. But if that model suddenly was now selling like expensive electronics less, the company was losing money. So they had to pay attention to what they were recommending and making sure the best products for the company were also being recommended.
Adel Nehme: And how do you go balancing out any potential friction, maybe between the metrics the business cares about and the metrics the data team may care about?
Brian Campbell: The best method I have found is to encode the business, what the business needs into some sort of tests so that when you are out there doing cool stuff to optimize your model, you or your scientists can focus on optimizing the model but then they can come back and check it against the test that you've written. Hey, for this input, it is very important that we get this set of outputs. Like in laptop example... Hey, we could be optimizing our model and then check it against common use cases to make sure that things that are important to recommend are still being recommended in an automated fashion.
Brian Campbell: So it comes from there. You have to just have the right partnership to understand the business, what the business needs and a lot of that is learned by hard experience. You won't necessarily think about these side effects in the first iteration, but as long as you're able to iterate quickly, you can fix it and move on in a second, third, fourth iteration.
Adel Nehme: So given your engineering background, I actually love your perspective on quality and availability. So data availability and data quality are also super important factors when deciding on which projects are high impact enough to go on a data team's roadmap. What are the baseline characteristics of data for a given project that give you confidence that this is something that can go on your roadmap?
Brian Campbell: So when it comes to putting something on the roadmap, what's really important to me is... Well, when I lead my team, my engineering team, we get to request for data. We don't say, "Hey, is this available?" We say, "Okay. Here's when we get it available by."
Brian Campbell: Almost all data can be gathered that you need so having an engineering team think about how they're going to gather it is what's important. So that's why you've got to get them in alignment quickly because they might come back and say, "We can get data but it'll take a year to build a new pipeline, to build a new control." So maybe this data isn't going to be available in the time we want. Then that's a choice you have to make.
Brian Campbell: The team itself, if you're a data engineering leader, try to focus on making anything you can available so that when a data scientist comes in with, "Hey, we've got this high value project and we need so new." You can go make it happen.
What are some of the common blockers data engineering teams go through in trying to procure data?
Adel Nehme: And what do you think are some of the common blockers data engineering teams go through in trying to procure data?
Brian Campbell: Silos are a big one. Hey, we've got data in some third party. That third party belongs to the marketing team. You've got to go figure out who on the marketing team's going to give you the access you need, and then figure out how we're going to interact with that third party without disrupting the marketing team.
Brian Campbell: We use Marketo at Lucid. Marketo of course, has a certain number of API calls. We purchased an API call limit from them. And if we're pulling data too fast, suddenly the marketing team can't do what they do. They don't have enough API calls. So we have to find those right trade off between getting the data we need while not affecting the actual business users. Then just novel formats. I think it's come from everybody has, is, "Hey, you get data in a format you've never seen before. It's going to take a while to figure it out."
Adel Nehme: And how agile in a drift data teams should be? What are some of the tactics you found effective when working in agile iterative manner? I'd love it if you can anchor some of those tactics in some real world examples.
Brian Campbell: So the most important thing I've learned from agile in getting out to data science is to start with an MVP and to build on the MVP quickly. I find a lot of data science projects fall into the waterfall trap of... We're going to create the perfect project, the perfect thing, and get that out in front of customers. When really you can start with something simple and naive and start learning about how your customers are going to interact with it.
Brian Campbell: So if we go back to the example from before about document clustering. To give some more context there, this was... Our customers could come into our app and do brainstorming sessions. They put stickies onto a digital whiteboard, put ideas on it and then we clustered those ideas into kind of topic groups for them so, that whoever's moderating the brainstorm doesn't have to lay it out themselves because that can take 10, 20 minutes.
Brian Campbell: So the first thing we did was we created a model that just returned random clusters, and that allowed the product development team that was building the user experience around this feature to start getting a sense of what the user experience should feel like. They're going to say, "Hey, we're going to send stickies this way. The API's going to return the clusters in this format and they could start building." So we didn't end up in a place where, when we were finally done, they could start building, making it take that much longer to get in front of customers.
Brian Campbell: And then once we had the algorithm we wanted in place, we set it up behind the API that the product development team had already built and the product manager started playing with it, but they were getting weird results. As I mentioned, it was taking a long time and they were not getting the results that was expected, but we didn't have enough data points from just the one product manager to see what was going wrong. So that product manager suggested an internal alpha. Okay, let's take this clustering feature, let's put it in front of other people in the company. See what's going on. See if it's just me or if there's something wrong.
Brian Campbell: And turned out, there were some things that were really wrong with their algorithm. There were lots of weird edge cases with the language processing we hadn't considered. People were submitting stickies that only had stock words in them, or they would be stickies with essays and a real life brainstorming session that we got to participate in and we saw them. They brought us a document that had hundreds and hundreds of sticky notes while we were expecting, in all of our test cases on the data science time, no more than 100. So that would... We're running out of memory. We were breaking down really quickly.
Brian Campbell: So being able to go to a customer that's around us, see their problem, fix it really fast, put it in front of them again, we're able to iterate really quickly and then get to something better, something better for our customers, faster than if we had tried to fix everything up front. And of course, there were things we never thought about that came out of this internal alpha and then actual customers would give us feedback too, and we'd fix it and we'd continue to iterate that way.
View Monitoring and MLOps
Adel Nehme: That's great. And I'd love to dive deeper into this and discuss the monitoring aspect of a project. So a lot of the times, data teams work on a model, deploy it and then problems arise in production. So how important do you view monitoring and MLOps and who do you think should own it?
Brian Campbell: All right. So monitoring is incredibly important. When we create a model and release it into the world, our intent is to change the world in some way and so our model by definition becomes inaccurate very quickly. But we also have the world around the model changing as well. We frequently see marketing changing who it's targeting and so our customer mix is going to change. We're going to see third party APIs change the way they work.
Brian Campbell: We have one model that's based on Zendesk. Zendesk changed their API and we didn't have the right monitoring in place to detect that we weren't getting the same data format as we were before and thus, we weren't getting the results we were getting before until somebody brought it up much too late that they had stopped seeing what they had expected. That's really a situation you don't want to be in. So getting that monitoring in place around your accuracy, around your business metrics is very important.
Brian Campbell: In terms of who should own it, difficult question because you need someone... You need a pioneer organization that is an expert in kind of your infrastructure for monitoring infrastructure and an expert in the data science side, expert in the product side. We've built up a data infrastructure team that combines the need for data engineering in these data pipelines as well as support for data science and that's been the team that's owned monitoring so far and that's worked for us. It's an engineering team that kind of cares about everything so they can put that monitoring in place, but it's a difficult skillset to find and difficult skillset to train. So we're all working towards it together.
Adel Nehme: Given this is a relatively nascent space, what do you think are some of the tools that kind of help teams monitor models and production?
Brian Campbell: So we inherited from our SRE DevOps team, Datadog, which is mostly a tool for... You can throw in numbers at it and then monitor how those numbers go up and down. And just the ease of doing that... It seems like a simple thing to do, but doing that at scale is very difficult so having a tool that handles that for is nice, and then it notifies us via Slack or we can set up a... For very important things. We can set up an alert through something like PagerDuty to tell us that something is broken. It needs to get changed.
Brian Campbell: But honestly, we've had to build a lot of stuff ourselves as well. It's not as mature a space as DevOps so there's going to be a lot of kind of hacking things together yourself. As we write, we write airflow dags that monitor systems and then do Slack notifications or kickoff retrainings.
Adel Nehme: Are there any particular tools that you're excited about?
Brian Campbell: One thing I've been impressed by are tools like... They're on the kind of the input side rather than the output side. There's tools like Great Expectations or Monte Carlo, who I know was on the show a couple weeks ago, that let you keep a really good eye on the data that's coming in and making sure it doesn't change.
Brian Campbell: One can tell you that you need to retrain if it changes too much or at least can tell you something has changed and you need to go figure it out. I really am looking for tools that keep an eye on output as well, making sure that what your model's outputting is still what you expect from it.
Adel Nehme: I'd love to segue here more to discuss how the data team fits within the organization and how important for data leaders is it to build relationships with the rest of the organization so that they're consistently iterating on high value projects.
Adel Nehme: Obviously, a lot of the high value problems waiting to be solved by the data team sit within functional business units. So how can the data team establish trust and friendship with these business units so that these business units become strong customers of the data team solutions?
Brian Campbell: I'll be honest up front, I don't have a good strong process for this and it's maybe because I haven't worked out enough, maybe because it's just... It's a much more human problem than what I'm used to dealing with in engineering and data scientist. But there are some things I found that work.
Brian Campbell: So first thing I found is that people like talking about the problems they're facing as long as they're interesting. so taking that time to take organizational leaders to lunch or to a coffee and just letting them open up about what they're dealing with will give you a lot of insight into what's going on and build a lot of trust by showing that you care about what's going on through the company. Once you have that catalog of problems in your company, then you can come back to data science team or analysts team and say, "Hey, which of these problems would be good for us to solve? Which of these are inherently data problems?" From there, you can come back and say, "Hey, I thought about your problem. I think we can put a solution together. Who's the right person in your organization to own creating a solution with us." And now you have a relationship that's more formalized and you can start going towards building your solutions.
Adel Nehme: And how do you go about prioritization based on the problems? Do you integrate the functional leaders in the prioritization process or is this owned by the analytics team entirely?
Brian Campbell: Yeah, the data team tends to own the prioritization side because we might have to do some extra footwork to figure out, well, how valuable would it actually be to solve this problem? But that's only one axes of the prioritization. The other one being, how feasible is this? Can we do this? Would it be worth doing? That's where you really need the data team to be involved.
Adel Nehme: The power of collaboration here also extends to implementing a data solution for any specific business problem. Can you comment on the steps it takes to take a data product from scoping to actually being embedded within a business process and the folks data leaders are expected to work with to make this a reality?
Brian Campbell: So I'm assuming you've already kind done the steps I've talked about. You've found your product that your problem and you found the expert that's going to be the person that you're validating your solutions with. Now you need to figure out your data. What data do I need to create solutions for this problem? I need to start working quickly with... Who's going to get you that data? Whether it's an engineering team, whether your team's going to have to work with a security team to get permissions to access something, whether you're going to have to work with some other team to get into some data silo and you got to start making those friends quickly and start ingesting that data quickly because data's what makes this whole system work. Otherwise, you would be have having a data problem.
Brian Campbell: From there, if you have enough, I have enough scientists, but there's usually more work to go around than scientists... While one person is playing with the data, another person is kind of exploring solutions and finding different things we could do once we have all the data. My data engineering team, we try to get at least some of the data quickly, some small subset so that as data scientists can take a look at and explore the format, what dimensions they have available and how they're going to need to clean it before they have the full data set, which could take days or weeks to get in place.
Brian Campbell: Then, once you start kind of forming around certain solutions, you have a good idea of what you're going to do, you need to find the implementation partners because they're the ones... Data science teams really have the full skillset to take something..., take their model and get it in front of customer. They need help from someone. If you need, you say, "Hey, we're going to create an API." You're going to need someone from your ops team or your platform teams to create that API.
Brian Campbell: If you want to add a feature to an app, you need a product development team that's going to [inaudible] own that UX. And these teams all have their own roadmaps. They have their own goals and you have to find a place where you fit within those goals. So making sure you have your solution pretty clearly delineated so they're not going to have to do a lot of fact finding for you and helping them understand that what you're doing is very valuable and thus it is good for them... It is just as good for them to help you as it is for you needing the help. Everybody's going to win if we put out something really cool.
Brian Campbell: That's the point where you understand who you're going to be working with. You've started to build solutions. It's mostly now about keeping everyone aligned, keeping everyone in the know about what's happening. Data science projects, the timelines are messy. I worked in engineering for years before I got into data science and those timelines are already messy and data science just has an extra level of complexity on top of that, that makes it harder to pin down. "Hey, we're starting this, this day and we're going to be done on this day." Model selection, model building, those always takes longer than you expect.
Brian Campbell: But you still need to communicate with all these partners about what's going on. So I find it helpful to set up clear milestones and at what milestone we're going to need our partners to get involved so they know, "Hey, we're at this point. It might be week. It might be two weeks when we're at this next point, but that's when we're going to need you to start helping out." So they know at least a few weeks ahead that they're going to be involved in this project.
Managing Data Science Projects
Adel Nehme: So you mentioned here the timelines of the data science project are especially messy. Do you find that there are some best practices data teams can adopt to gain some automation on common data science tasks to reduce the workflow time?
Brian Campbell: Yeah. We're starting to see really cool stuff in cleaning automation. AWS put out their cleaning tools. We've only had a chance to play with them. We haven't productized them yet, put it into a production project, but it's a space that's coming and it's going to hopefully speed things up and make thinks more reliable, which is really what data science needs to become more valuable in organizations.
Adel Nehme: What do you think data science needs in order to be more valuable throughout the organization?
Brian Campbell: In my own organization, I find us dealing a lot with skepticism. Because we're pushing boundaries about what's possible, people have a hard time believing that we're actually going to do it. We set out on this topic modeling adventure of, hey, we can take the things people wrote and put them into groups based on themes is something so new to computers that people that have been working in computers for... Doesn't even be that long but feels like the limitation, feels like something you shouldn't be able to do and so getting cool things in front of people is how you get them to believe that you can do valuable things and as you get people to believe in you, they trust you with bigger and bigger projects.
Adel Nehme: And do you find that high levels of data literacy are important to foster within an organization in order to have a fruitful collaboration with functional leaders? What is the data team's role in educating the remainder of the organization?
Brian Campbell: To the extent that an organization is data literate, you're going to have these better collaborations, but I found that it's hard to put the responsibility of data literacy on other teams. It's really up to the data team to make data as clear as possible so we can communicate about it as clearly as possible.
Brian Campbell: Yeah. So that comes down to... One thing we've seen a lot of success for in our team is in the visualization space. Our sales team, our executive team doesn't have time to be playing with spreadsheets of data. We need to find what they need to know and turn that into a visualization in Tableau or a custom application for them to explore this data in a way that is accessible to them.
Brian Campbell: Yeah. So I would say that the data organization isn't about making the organization more data literate, but making data more accessible to the organization so you can talk about it so they can make the best decisions they can.
Adel Nehme: What do you think are some of the other tactics data teams can employ to further enable access? Something that comes into mind is data discovery tools, for example, or metadata management tools that let everyone search for the data that they need.
Brian Campbell: Absolutely. Our analytics team, who I'm not as involved with, but I've been... They're just a rock on team, they put together... They call it their metrics catalog. It's a pretty simple webpage but it lays out, hey, here's all of the metrics that we track as a company. Here's how we calculate it so if you want to explore that calculation, here's visualizations. Here's SQL queries if you want to jump into the data warehouse and start exploring that metric.
Brian Campbell: They have thus this piece of metadata that's in a very accessible format. It's not just in a table, but a step beyond that, that gives people the ability to jump in at the level they're comfortable. I think that's helped to increase visibility of our data and made it more used through our company.
Adel Nehme: So data teams are often organized differently from one organization to another. You have centralized center of excellence style data teams, embedded data teams where data scientists are embedded in functional groups or something in between like a hybrid model. What do you think is the best organizational model that promotes alignment between the data team and functional experts and business leaders?
Brian Campbell: Yeah. So my data science and data engineering teams are currently a centralized team because it's very small. We have three engineers, three scientists. And we can have a whole other conversation... How do you get anything done with that so few people? It's been an adventure. But I can say looking at our analytics team, who I just talked about, the director, Tim Jenkins has come up with this awesome model for getting his analysts into the organization. It's a hybrid between the centralization and the functional groups where there's a central team that's focused on improving the data quality overall. It's interested in improving the systems that we have in place as well as for handling requests that fall through.
Brian Campbell: Early in his organization's life, the HR team, the people Ops team didn't have a dedicated analyst, but they had an interesting request come through so it came through to the central team. But he also has teams embedded in marketing and sales and product development that are able to meet those teams needs specifically.
Brian Campbell: But he doesn't go to the level that we see at Google, where you have data people on the individual teams themselves. These analysts are working on more of the organizational level and there's a lot of value there because their customers then are the organizational leaders and so they're able to get involved at the division level and the roadmap level more.
Adel Nehme: This is really interesting and given your experience, working in a lean startup, what are some of the best practices you've adopted to enable the data team to stay hyper focused and laser focused on value while also keeping a sense of work-life balance within the team?
Brian Campbell: Yeah. So the one thing that's been game changing for us is [inaudible] solutions. The one thing that's let us put out models, put out visualizations, maintain pipelines with just a few people has been that we don't need to maintain our servers. We can put something in lambda. We can write a container and put an AWS Fargate, and then not have to worry about the maintenance.
Brian Campbell: I recently was talking to another company where I was hiring a senior data engineer talking about kind of what they were looking for their field, and they were looking for someone that was able to keep their spark cluster running. And I was like, "I just paid databricks to do that. I don't have to worry about bringing on a whole other head to my engineering team to take care of that when I can outsource it to an organization that's doing an excellent job."
Adel Nehme: And do you find that the tooling space is making data scientists more effective and productive today? What are the tools that you're excited most about that will make the data science workflow much more efficient?
Brian Campbell: Thinking through where has my team spent time now that I wish they weren't. We're seeing developments in automating of model deployments. So right now, we have a fairly manual model deployment step where we're running scripts ourselves on our laptops on [inaudible] but we're seeing a lot of...
Brian Campbell: As well as having to build something to deploy, we have to have a platform team build an API. But we see stuff like SageMaker that's push button. You get a container. That container's running. It has an API. You just call that endpoint with your data and you're getting your results and that could make a huge difference for a small team that doesn't have the expertise in setting up servers, setting up APIs. And data pipelines are also getting lower and lower maintenance. We've got...
Brian Campbell: I'm at a [inaudible] company so those are the tools I know the best but we have AWS glue writing the server with ETL solution so that there's now another layer of maintenance you don't have to worry about.
Adel Nehme: It's really exciting to see a lot of new tools like centralized data warehouses, data platforms and all of that fun stuff that really reduce the barrier to working with data and make the data science workflow much more effective.
Brian Campbell: Yeah. And I know a lot of these tools are using Spark or the principles of Spark under the hood and it's so nice to not manage that myself. Spark is an incredibly difficult tool to keep running on giant clusters so having somebody else do that makes it... So my team can focus on what my company needs rather than just this maintenance.
How to become better collaborators?
Adel Nehme: So given that successful data science within an organization goes beyond hard or technical data skills, what are some of the most important skills practitioners and leaders should cultivate to become better collaborators?
Brian Campbell: Yeah. A theme through this has been about communicating with your collaborators. So making sure that you're communicating in such a way that you are understood. And even with dealing with somebody that may not be as good of a communicator, that you can still work to understand them is very important.
Brian Campbell: And for me, the key text there has been crucial conversations. It lays out dealing with difficult conversations, places where the stakes are high, but the principles it has about finding shared meaning, making sure we understand each other are so broadly applicable. I definitely recommend it.
Brian Campbell: And then on the other side, another aspect here is making sure that you're running your projects smoothly. My data science team doesn't have a trained project manager. I imagine very few of them do. So just understanding the basics of how to take a project, get its requirements, breaking those down into a timeline with milestones so that you can communicate about that is very important to maintaining a relationship with trust throughout the project.
Adel Nehme: And how often do you think teams should be communicating with stakeholders?
Brian Campbell: Kind of being born from an engineering organization, we have two weeks scrum sprints and at the end of our sprints, we have a review and acceptance meeting. I think that's pretty standard in scrum to go over what we've done and where we're going next with our stakeholders. You got to find the right cadence for your organization but I find through the industry two weeks is very common.
Call to Action
Adel Nehme: Finally, Brian, do you have any call to action before we wrap up today's episode?
Brian Campbell: Yeah. So like I said, the theme through a lot of this has been about collaboration, building those relationships and so I would like to challenge of you to go and find the collaborators that you're missing, who you're going to need help from in validating a solution, who you're going to need help from in releasing a solution and start building that relationship. Go get a coffee or take them to lunch. See what's going on in their world and start expressing that you're going to need their help finding that common ground.
Adel Nehme: Thanks so much, Brian, for sharing your insights and for coming on the show.
Brian Campbell: Yeah. Thank you, Adel. It's been great to talk to you today.
Adel Nehme: That's it for today's episode of data framed. Thanks for being with us. I really enjoyed Brian's insights on the importance of collaboration and data science project management. If you enjoy this podcast, make sure to leave a review on iTunes. Our next episode will be with Andy Cotgreeve, technical evangelist at Tableau. I hope it'll be useful for you and we'll hope to catch you next time on DataFramed.
DataCamp Instructor Connect Summit: February 15-16, 2023
Why We Need More Data Empathy
We talk with Phil Harvey about the concept of data empath, real-world examples of data empathy, the importance of practice when learning something new, the role of data empathy in AI development, and much more.
Introduction to Probability Rules Cheat Sheet
Data Governance Fundamentals Cheat Sheet