Official Blog
dataframed
+6

Creating Smart Cities With Data Science

Amen Ra Mashariki deep dives into the state of data literacy in government agencies, the use cases he worked on to make the City of New York smarter, and more!


Adel Nehme, the host of DataFramed, the DataCamp podcast, recently interviewed Amen Ra Mashariki, Principal Scientist at Nvidia and the former Chief Analytics Officer of the City of New York.

Introducing Amen Ra Mashariki

Adel Nehme: Hello. This is Adel Nehme from DataCamp, and welcome to DataFramed. A podcast covering all things data and its impact on organizations across the world. Throughout the years we have definitely seen a rise in data science and analytics in a city and government agency context. Whether that's governments investing in smart cities, hiring chief data officers, opening data up to citizens, or improving operational efficiency within the government itself. However, that does not come without its unique opportunities, challenges, and concerns.

Adel Nehme: This is why I'm excited to have Amen Ra Mashariki for today's episode. Amen is a principal scientist at Nvidia and is the Global Director of the Data Lab of the World Resources Institute. Previously, Amen was the head of machine learning or Urbint and served as adjunct faculty at NYU Center for Urban Science and Progress. Amen also served as a fellow at the Harvard Ash Center for Democratic Governance and Innovation.

Adel Nehme: He was also the head of Urban Analytics at Esri and was Chief Analytics Officer for the city of New York and the Director of the Mayor's Office of Data Analytics. In 2012, Amen was one of the 11 individuals appointed by President Obama to the 2012-2013 Class of White House Fellows. Immediately after the fellowship, he was appointed the Chief Technology Officer for the Office of Personnel Management. Amen earned a doctorate in engineering from Morgan State University, as well as a master and bachelor of science degree in computer science from Howard and Lincoln University respectively.

Adel Nehme: In this episode, we talk about the distinction between chief data officers and chief analytics officers. How he defines smart cities and discuss some of the use cases he worked on to make the city of New York more data-driven. We also talk about the unique challenges in ethics and privacy when working on data projects in a government setting, and discuss the state and importance of data literacy in government to drive efficiency and the responsible use of data. If you want to check out previous episodes of the podcast and show notes, make sure to go to www.datacamp.com/community/podcast. Amen, it's great to have you on the show.

Amen Ra Mashariki: Adel, it's my pleasure. Thank you so much for inviting me to be a part of this conversation. Really looking forward to this discussion.

How did you get into the data space?

Adel Nehme: I am really excited to talk to you about data science in government agencies, the importance of data drills, how to leverage data for better disaster response. But before, can you tell me a bit about your background and how you got into the data space?

Amen Ra Mashariki: Yeah. I think the way I would talk about how I got into the data science space is to sort of give you a sense of sort of my journey. As a computer scientist, my undergraduate, masters and doctorate degrees were in computer science and engineering. And so I really started off as a computer scientist and I was an engineer at Motorola for a long period of time. And then I moved into academia, got my doctorate, moved into academia, and I really focused on sort of the bioinformatics domain at Johns Hopkins. And then from there I into the federal government where I focused on tech and policy.

Amen Ra Mashariki: And really it was the larger concept of tech technology all facets. But because it was a political appointee in the Obama Administration, and the Obama Administration really focused on this relatively new concepts specifically for the federal government of big data and big data concepts and big data technology. And that's where I really began to cut my teeth in the data science space, really thinking about how big data can be used in the federal government space.

Amen Ra Mashariki: And so I spent a lot of time working with agencies on big data, big data concepts, big data technologies, investing in cloud computing and so on and so forth. And then I got the opportunity of a lifetime to become the chief analytics officer for the city of New York. And that's where I really jumped into the deep end with both feet and grew and perfected sort of my ability to use data science to solve really complex problems specifically in an urban context. So for me, data science was never conceptual. I didn't learn it in academia it was always understanding it from the applied perspective.

Adel Nehme: That is awesome. And I would love to zero in on your experiences in the public sector, especially. So, we often hear cities employing a lot of chief data officers even organizations as well. Can you define the difference between a chief analytics officer and a chief data officer specifically in a government context?

Amen Ra Mashariki: Yeah. Yeah. I'll start by saying that one, I think chief data officers themselves are different across each other. So in the United States and globally municipalities are adopting this concept of chief data officers because city leaders urban leaders want to be data-driven and so on and so forth. And so they're really adopting this concept of hiring chief data officers.

Amen Ra Mashariki: But I think there isn't one clear definition of a chief data officer. So what you'll find is that depending on the city, their role, that chief data officer role will be different, depending on the background of the person hired into a role, depending on where that role actually sits. In some cities, the chief data officer reports into the CIO, the chief information officer. In some cities reports into the Mayor or under the Mayor's Office structure. In some cities, it can report into the office of management and budget.

Amen Ra Mashariki: And so it just differs. Even the concept of chief data officer itself is different depending on where you go. What you won't find is a chief analytics officer. The chief analytics officer in New York City is the only title definitely across the United States. I can't speak intelligently for international, but I know within the United States, there is no other chief analytics officer in the city.

Amen Ra Mashariki: And so I differentiated this role a couple of ways. One is I reported into a deputy mayor. And so when this role was created by Mayor Bloomberg back in 2013, I took the reigns in 2014. The executive order that was written to create this role was written to say that, "This person needs to have an executive leadership stance within city government. We have to put this person as high as possible suspect. They can maximize their impact." So that's one differentiator, and then the other is that we worked on analytics projects that helped to drive operational efficiencies within the city.

Amen Ra Mashariki: So we didn't do performance management. We didn't do sort of data services. We didn't do data management. I didn't have to think about databases and cloud infrastructure. All I thought about was a problem and how to take the data scientists that reported into me and use that to solve that problem. And so I would differentiate chief analytics officer and chief data officer that way.

Adel Nehme: So in summary if I'm capturing this correctly, it would be a chief data officer would often be sometimes someone who is tasked with managing the infrastructure, scaling the data science practice in some sense that granting data access. Whereas the chief analytics officer is someone who's just looking at data science problems and trying to operationalize them within a government setting. Would that be correct?

Amen Ra Mashariki: I could not have said it better myself, Adel.

Importance of Smart Cities

Adel Nehme: Then given your experiences and expertise creating smarter cities with data, can you share your thoughts on the importance of smart cities and how that enables government agencies to be more efficient and some of the work you've done to make New York City a smarter city?

Amen Ra Mashariki: Yeah. So in general this term smart cities has gone the way of the term big data. It's used a lot and the meaning sort of is malleable, in many instances it doesn't mean much it rings hollow. What I would say is when I think about "smart city," I think about a city that can be reactive to things that it hasn't been able to be reactive to before.

Amen Ra Mashariki: Let me pull back. When you think about cities, if you think simply about cities, there's two things that city leadership does. The first one is strategize, in terms of urban planning, in terms of strategic initiatives. So you put these strategic initiatives into play. When you think about strategic initiatives, you want to think about, Oh, how do we lower greenhouse gas emissions within our city over the next 30 years? And so what's the strategic mechanisms and levers that the city can pull in order to meet a goal of reducing greenhouse gas emissions by 50% over the next 30 years. That's a strategic effort.

Amen Ra Mashariki: Then, cities respond to things that happen to them. No case is better than COVID. No one expected this to happen. It happened. Cities have to do things, city leadership, citizens, residents have to do things that they've never done before. And especially city workers have to react in a way. And so the way you want to think about... how I think about a smart city is one, is a smart city that has the ability to react in such an efficient, timely, and precise way that it minimizes any damage to infrastructure and any damage to the residents of that city. That's one, reaction.

Amen Ra Mashariki: Two, is being proactive. A city that is proactive around its infrastructure and understanding how to get out in front of any possible challenges. So the way I think about it is there are no knowns, known unknowns, and unknown unknowns. A smart city is the one that has its arms wrapped around a known knowns. So we know what we know, and we can solve those problems quickly and efficiently. The known unknowns, hey, there are things that we know that we don't have our arms wrapped around, but we can be thoughtful around ways to invest, to get our arms wrapped around these challenges. And then for the unknown unknowns, we say, "Look, there were things that we just don't know." In 2019, no one was preparing for a pandemic. So we didn't know that was coming down the pipe, but that when it hits, we have things in place to be able to proactively get out in front of big problems coming down the pipe. And so that's the way I think of a smart city.

Amen Ra Mashariki: Smart city is one that can be reactive where it hasn't been reactive before and one that can be proactive. All in the pursuit of protecting residences and citizens, and growing their quality of life. I give you one reactive and I give you one quick proactive. Reactive was a tenant harassment. So New York City City Council found that landlords were harassing tenants in order to get them to leave their apartments, such that the landlords could raise the rent rate and to harass tenants in order to force them to leave is illegal. It goes against human rights policy and civil rights policy that was set up in New York City and legislation.

Amen Ra Mashariki: And so City Council and the mayor, they pulled together a task force and said, "Hey, we need to get out in front of this. Too many people are complaining about being harassed and ultimately being kicked out of their apartments illegally." And so then they came to my office and he said, "Can you help us use data to find landlords who are participating in illegal tenant harassment." And what we did was we looked at data and what I refer to as the timeline of harassment. So we looked at many different things that gave us early indicators of where tenant harassment is likely happening. So we weren't able to put together a list and say, "These are seven landlords who are conducting harassment." What we were able to do in data is to identify a number of names based on occurrences, to identify where tenant harassment was likely happening. And so we took that list and we would give it to inspectors and inspectors would go knocking on doors.

Amen Ra Mashariki: And so the way I think about New York City is finding things in New York City is like looking for a needle in a haystack. My office's job was to use data, to burn down the haystack, to make it easier to find the needles. So our job wasn't to find the needle, instead of having inspectors go to 400,000 homes, which would take forever and which would be fairly cumbersome, we say, "Here's a list of 843 landlords that you should reach out to."

Amen Ra Mashariki: And so that was reactive. We were being reactive to the need, to stem the tide of tenant harassment. And so a more proactive solution that we put together was we worked with a small business agency to build this tool called Business Atlas. Now, Business Atlas was, we tied together data across the city to provide sort of free market research information on the city, such that small business owners could actually use this tool in order to identify places where they should likely open up businesses.

Amen Ra Mashariki: So for instance, we took data on crime data, park data. How many parks are in the neighborhood? We took education data. How many schools and so on and so forth. We took data from the census. We took data from where businesses were opening up. We took data around where businesses and what types of businesses were closing down to give you and the users of the Business Atlas insight onto what was happening in certain neighborhoods to be proactive around, "Oh, I would like to."

Amen Ra Mashariki: So if you put in an address into this tool, you got all of this information that we connected through fuzzy logic to basically say, "If you're opening up a toy store, you've got a lot of grandmothers, older people who live in this community. You've got a lot of green space, a lot of parks. You've got a lot of open space. And so, yeah, there's going to be a lot of kids around, so it's best to open a toy store in this neighborhood." And so we built this tool called Business Atlas, where we connected all of this data from the city to give insights. That's an example of a reactive solution in tenant harassment, and then an example of a proactive solution in terms of helping the city drive its strategic goal of growing small businesses.

What challenges do data teams face?

Adel Nehme: That's an awesome definition of a smart city. And taking that definition of a smart city as a city that is both reactive and proactive about solving problems with data, especially taking into account these use cases that you've led operationalizing of and help operationalize. What do you think are the unique challenges data teams face when working in a government agency setting?

Amen Ra Mashariki: Yeah. Yeah. The challenges are unique for these data entities that work within cities. One is, people often assume cities. There's just tons of data just sitting around in cities, waiting for some smart data scientists to just use it and find insights about that city that has never been surfaced before. And that's just not true. Oftentimes a lot of the systems that cities have invested in were not brought online for the express purpose of 15, 20, sometimes 30 years down the line for data scientists to use. So one, a lot of this data is just locked away and unable to be used in any forceful way unless you invest in modernizing and transforming some of these legacy systems. That's one. So there's tons of data that we just don't have access to just by virtue of it wasn't built for us to have access to. That's one.

Amen Ra Mashariki: Two is the quality of that data degrades over time in how it's managed, who manages it. As you know city governments, people work in positions for three, four years, and then they leave. They may not leave city government, they may move to another agency, but then they've left behind this system. And then next thing you know 10 years later, it's degraded over time. The data in it is of low quality, is oftentimes I've exercised projects and thought you hit the jackpot with a data set and then when you dig into the data set, you realize you really just can't use it because it's just degraded. So there's limited data there.

Amen Ra Mashariki: Also there's so many regulatory and legal issues to getting access to data. Yes, there's some really rich data about residences, but there's all sorts of thoughtful and purposeful regulations around using that data. FUBA, with education data. HIPAA, with health data, so on and so forth. There's all sorts of challenges with using a lot of the data.

Amen Ra Mashariki: So, oftentimes you'll actually find yourself as a data entity in a city with access to very little data that's usable. And then now you have to build relationships with private sector, build relationships with agencies. And so oftentimes it's not just a data science problem, it's actually a relationship and data sleuthing problem. Where can you find data sets that can substitute for data set that you think should exist. And so you spin a lot of your wheels just digging for data in cities.

Adel Nehme: Yeah. So there's definitely an infrastructure data access problem and a data quality problem that are unique to government agencies and to a government use case. Now, when given the diverse nature of the data being used and a lot of data projects in government setting, there's all a lot of social, economic, political dimensions of the data and the use cases you've worked on. What do you think are some of the best practices you've developed to ensure the successful and responsible delivery with these use cases?

Amen Ra Mashariki: That's a great question, Adel. One best practice is I always exercise the ability to embed data scientists within the executing agency. So what that means is, if Department of Buildings in New York City reaches out and says, "Hey, we would really like to use your team's expertise to help us identify illegally convert..." This is a true problem. "Illegally converted buildings." So illegally converted buildings in New York City is a building that was zoned and licensed and permitted for eight units. But the landlord says, "Hey, I can make more money if I can turn this into a building for 12 units. I'm permitted and licensed for eight units, but I'm going to do some illegal things and converted into a 12 unit building."

Amen Ra Mashariki: And those illegal things, splicing wires to get electricity and gas pipes and so on and so forth often cause fires. So a high predictor of where fires would likely happen. And so Department of Buildings says, "Hey, we want to minimize fires in the city. We want to identify buildings that illegally converted." So they came to my office and they said, "Hey, can you help us build data science model to identify buildings. Because there's over a million buildings in New York City. You can't just send inspectors to every building. So can you help us?" We said, "Yeah."

Amen Ra Mashariki: But we did not do the project. I took a person or two from the data team and embedded them within the Department of Buildings team. Because the insight and the knowledge that is needed, the types of data sources that could be used, it should be used, where are they sorts of all of that sits within the Department of Buildings agency.

Amen Ra Mashariki: So instead of sort of trying to be this all encompassing group of data scientists, "Hey, we are going to be the domain experts. We're going to be the data science experts. We're going to solve this problem. We're going to be the superheroes of this." Actually the superheroes sit with the agencies and so really your job is just to add data-driven value to their work. So embedding data scientists with these agencies is the best way that I saw to solve really complex operational challenges.

Adel Nehme: It's ultimately about combining the subject matter expertise with the data-driven expertise of your team in that sense. So then going beyond the step, beyond that, and you've mentioned as well regulation, data quality as being unique challenges here, how do you view the friction between using data to, for example, help disenfranchised communities while also maintaining data ethics, privacy, and the responsible use of data?

Amen Ra Mashariki: It's non-trivial. It's non-trivial and I think we need to see it as such. So this challenge of juxtaposing solving really big, hairy problems in cities, but also respecting ethical concerns and privacy concerns of citizens, there's no easy way around that. And so we need to tackle these non-trivial challenges with really complex solutions and really thoughtful solution. So one, that's never going to go away. You're never going to have citizens or residents who say, "Hey, listen, do whatever you want with my data. It doesn't matter." Actually that's growing. The concern of citizens in the use has grown. So there's always going to be friction there.

Amen Ra Mashariki: There's technical ways that you can begin to move the ball in terms of ameliorating any challenges there or removing some friction. But I think a lot of it has to sit on two things. One is success in your execution. When citizens and residents understand that there are going to be times when you need to use their data to successfully execute a problem and that you show a record of successfully executing these problems, then one, that goes a long way.

Amen Ra Mashariki: The second is be responsive to their concerns. So whenever they bring up any type of concerns around privacy, don't dismiss it, engage them. Don't sort of fall back and get defensive and dismissive actually lean in, actually turn into that challenge and engage them to make folks feel more comfortable. There's no way you get around solving problems without using what could be considered sensitive data. I haven't seen that as a possibility in cities. But what you can do is deploy some technologies like obfuscating data and so on and so forth. There's technical solutions, but then there's also the non-technical side and that's the engagement side.

What is a data drill?

Adel Nehme: Yeah, that's definitely spot on. And one thing pivoting away slightly from the operationalization of use cases in government, and just thinking about data maturity in government agencies as well. One thing I've seen you speak about in the context of data and government agencies is data drills, and how it helps cities better prepare and better be reactive to emergencies, and how it helps them understand their own data capacities as well. Can you define what a data drill is and some of the best practices you've learned organizing them?

Amen Ra Mashariki: Yeah, absolutely. Data drills were a great... they were very interesting, very helpful and really fun in executing them, I must say. So data drills came about... when I became the chief analytics officer for the city of New York, this was right after Sandy, Mayor Bloomberg did an after action report. In that Sandy after action report, and that after action report for any of your listeners, it's still available online. If you do Sandy AAR or Sandy after action report New York City it'll come up. And in that after action report, it mentioned multiple places where the Mayor's Office of Data Analytics needed to play a role in galvanizing agencies around data, using data, getting access to data, all of these things, the city did not have the capacity to do during Sandy. So that after action report said moving forward, you need to kind of get out in front.

Amen Ra Mashariki: And so my office got pulled into a couple of emergencies, and what I noticed was, agencies are fine in terms of managing their own data when emergencies happen, where agencies need to share data, and let's keep in mind sort of city's COVID response as I'm talking about this. Whenever there's a need to share data, everyone sort of freezes and goes to lowest common denominator, which is Excel spreadsheets. "Let's just dump data in Excel spreadsheets. Let's fat finger data into Excel spreadsheets, and let's just send them via email and thumb drives all over the place." You can only imagine how big of a mess that can be and how imprecise your data response would be if you're just emailing Excel spreadsheets all over the city and so on and so forth.

Amen Ra Mashariki: So I realized we just do not have the capacity as a city to respond to catastrophes and challenges from the data perspective. NYPD knows how to work well with FDNY, with the Fire Department in New York City, which knows how to work well with the Sanitation Department, who knows how to work well with Department of Buildings and so on and so forth. These agencies have been doing this for decades upon decades. They know how to work with each other. They know who to call, what the protocol has to be, where to call, where to go. But when it came to sharing data, we just didn't know anything.

Amen Ra Mashariki: And so what data drills were essentially an effort to say, "Let's start practicing how to share and use data during emergencies before an emergency happens such that when an emergency does happen we've actually practiced this and we understand what we need to do." So it's not only a practicing process and protocol, but it's also practicing the use of tools. Because one of the things we ran into was even if agencies wanted to use tools and software that they had purchased to share data and integrate data, what we found out was that, hey, some agencies invested in this company's tools. Other agencies invested in another company's tools and they just didn't talk. They weren't interoperable.

Amen Ra Mashariki: And so one of the things about data drills was understanding what tools do we need to use and invest in order to make sure that at the time of emergencies our data systems are interoperable, if they need to be. And so it was all about training, doing drills on how to share and use data. So literally we would create a scenario, a real life scenario in which something bad happened in a city. And we would get data leaders and data members, that could be data scientists. It could be data analyst. It could be database managers from across all of these agencies and literally sit at a table and present them with this emergency. And then everyone would have to stop sharing, how do we think about working with each other at the table in order to share data, share insights in order to solve the problem. That really went a long way in getting the city to be more thoughtful about responding to data emergencies.

The State of Data Skills in Government

Adel Nehme: Now, one other thing that I'd like to pivot away from here and to talk about is really discuss the state of data skills in government. Love to deep dive into the skills you think are needed to become more successful while working with data in a government setting. So broadly, there are two components to this. The first being data experts, honing specific skillsets that will help them navigate the government landscape per se. And the second is government employees developing data literacy to use data on a daily basis and the work. If we want to expand on the first component what do you think are the skills data experts need to hone, specifically in government, to be able to drive more impact and accountability in their projects and work?

Amen Ra Mashariki: Yeah. I think data experts need to really understand the definitions of data quality and where data quality is useful, the concept of data quality is useful, and how to identify when a dataset is appropriate to be used or not. And so understanding data quality is going to be key for data expert. Understanding data providence, one of the things I used to say when I was in New York City was that data are like waves. You never know where they began.

Amen Ra Mashariki: And so it's like, you get this dataset and that data could have been around for 15, 20, 30 years and changed hands and changed owners and been manipulated and modified over time to the point where the definition of the dataset, the description of the dataset makes it seem like it's going to be helpful, but really, if you actually understood the providence of that data set, you will say like, "Well, there's no way I could use this here. It just went through the ringer and the quality of this is probably going to be very low." So you need to understand, sort of have data experts who build out systems and mechanisms to understand data Providence.

Amen Ra Mashariki: And then I think there's a term, it's a little tongue in cheek and it's a little kitschy, but the terminology that we used to use was people who were sort of data sleuths or data curious. So you really had to have people who had the ability to track down data sets in the city.

Amen Ra Mashariki: We were working on one project, actually the tenant harassment project, and to know which units are rent stabilized under law is managed by the state. And so the state does not share this list of rent stabilized units with the city. So what my team found out was that during the Bloomberg Administration, I was at the beginning of the de Blasio Administration.

Amen Ra Mashariki: During the Bloomberg Administration, some smart bureaucrat in the Bloomberg Administration figured out a way to get a proxy list of every building that has a rent stabilized unit. When he did that was charged the landlord of a rent stabilized building a nominal tax. And this is important that they call it a tax. A nominal tax yearly tax of 10 bucks. So every year they had to just pay $10. If you own a building, you're like, "I'll pay that $10." But because you pay that $10, New York City's Department of Finance tracks that tax payment. And so that tax payment now gets triggered with the building address in the Department of Finance building list. So now Department of Finance has an authoritative list of every rent-stabilized building in New York City, because they are required to pay taxes.

Amen Ra Mashariki: Now, when you transfer from the Bloomberg Administration to de Blasio Administration, when things like that were done that doesn't really get shared at the water cooler. These things get lost. They just stayed with one administration. They never get transferred over to another administration. Why would they? Who would remember to share this... Who would know to ask? Who would remember to share?

Amen Ra Mashariki: And so you needed to have people who were focused on finding datasets and understanding those data sets and where they came from and the history of those datasets and understanding the context that you can understand how useful they will be. I mean, it was a big fine for us to dig in and ask enough questions to find that particular data set at Department of Finance. So you need people who are very curious about data and who asks tons of questions.

Building Data Literacy

Adel Nehme: So definitely that ability to operate and navigate in a government setting that is so unique to government agencies there. So diving into the second component, how do you view the importance than of building data literacy for non-technical stakeholders across agencies?

Amen Ra Mashariki: Well, I think data literacy is so incredibly important because at the core of it all, most people within government use and manage the data. Whether that's Excel spreadsheets, whether that's access database, whether that's punching some information into a content management system, into a case management system, everybody engages with data in some capacity. And so, one, democratizing, or streamlining, or ensuring that everyone or the majority of people within city government have these core basic data literacy skills, maximizes your ability for data-driven government to be executed in a really efficient and precise way.

Amen Ra Mashariki: And so I certainly have seen and experienced government in a way to be able to forthrightly say that you cannot have a data-driven government if only a small... and it seems intuitive and logical, but you cannot have a truly have a data-driven government if only a small privileged set of your government employees are data experts, are data literate. You have to ensure all are data literate, but not only data literate just in general, but are consistent across that data literacy in terms of shared knowledge, shared understanding. And so people getting taught on the same things to be at the same level, to have the same understanding of vocabulary, verbiage and concepts and processes. So data literacy is going to be key.

Amen Ra Mashariki: We think that we can hire a team of four or five people that have the leader and give them a fancy title like chief data officer and say, "All right, now the city is data-driven." No. The whole group of government employees need to have some minimum data literacy capability. I'll even go one step further and then say that community members who often engage with the government need to have mechanisms in which the government can invest in their data literacy as well.

Adel Nehme: What do you think are then the main data skills needed for government agencies to get started? Especially on the data literacy side.

Amen Ra Mashariki: So I think a key skill that could be taught across the government for data literacy is exploratory data analysis. Just from a pure standpoint of understanding, I've been collecting data for decades, what does this data say? What is this data a representative of? What that does is, two things. One is we're always so excited about insights and we always believe like there's this data set that's sitting here and if I probate it that is going to tell me the world's secrets or the city's, in this context, the city's secrets and all these things we didn't know about the city. That can happen. It's very rare.

Amen Ra Mashariki: But having the ability to do exploratory data analysis gives you the ability to baseline and understand the data set that you're working with. And here's why that's important. It gives you the sense of, well, this is not high-quality data set. So you can immediately tag your data set if you as a government employee understand exploratory data analysis, oftentimes you know, hey, this is not a bar... Let me give you an example. I'll leave out the name of the agency to protect people. But one agency asked my office to help work with them to do an analysis for them. A really, really, really time sensitive and very important analysis. As a matter of fact Federal Department of Justice was looking to Sue New York City around a number of things. And so this agency said, "Hey, we've got two weeks to respond. Can you help us do this data analysis?"

Amen Ra Mashariki: So we said, "Sure, give us all of your data." They gave us all of this data and when my team dug into it, just exploratory data analysis. Just simple bar graphs and line plots and so on and so forth, [violin] plots. They said, "This is interesting. The data that this agency gave us seems to be orders of magnitude different than the national data set." So we went to the federal government, downloaded the federal government's national view of this sort of local dataset. When you looked at the two data sets, it was like New York City's just seems to be way off from the national average on block.

Amen Ra Mashariki: And sure enough you go back to this agency and you say, "Hey, there's something up with this data. We don't think that it's good data through exploratory data analysis." And they were like, "Oh yeah. You know what? Our bad, we forgot to give you this other data set. And we forgot to give you the more complete data set." And so just having simple exploratory analysis skills allowed for us to understand if we were looking at quality data.

Amen Ra Mashariki: What it also does is allows, and this is a big conversation. And you began to allude to this a little bit, Adel, in terms of services, in terms of equity, in terms of engaging under served communities is understanding the biases that exist in that dataset that you have. So, many people want to take a dataset and train that data set, and build a model, and start identifying insights on that data set without even beginning to understand whether or, and not what the biases of this dataset is.

Amen Ra Mashariki: And having exploratory data analysis skills, lets every... Imagine every person at every agency has the ability to look at a data set that they're working with and better understand the biases that are being introduced by that dataset. So I think exploratory... I think EDA is a good sort of core capability for government employees to have.

Adel Nehme: Then given the importance of democratizing some of the data skills, especially exploratory data analysis and the government setting, what do you think is the data team's responsibility when equipping non-technical stakeholders with these skills?

Amen Ra Mashariki: Well, I think identifying quality standards. So what ultimately should happen for a city data team is that the city data teams should get out of the role of doing all of the work, but get into the space of setting standards, setting best practices, setting processes and workflows that can be used by everyone who is looking to do some core data work within their agencies. So training can be a part of that, but training with respect to standards, best practices, strategies, tactics on and so forth.

Infrastructure for Scalable Data Science

Adel Nehme: So ultimately it's about creating the infrastructure for scalable data science throughout agencies. Now, Amen before we wrap up, I've also seen that you recently joined Nvidia as a principal scientist. Can you walk us through the work that you've been doing there?

Amen Ra Mashariki: Yeah. I was excited to recently join Nvidia as a principal scientist, primarily working with the AI nations and helping to build our capacity to use AI to help cities solve some of the more complex challenges. My key role when I think about it is if you look at a world-class AI company like Nvidia that has high quality, best of class AI infrastructure and technology and then you look at cities, there's all of these use cases, these complex challenging use cases. How do we bring these two things together in order to help solve these city problems?

Amen Ra Mashariki: Because there's a lot of use cases in cities that should not, and don't require AI to solve them. But then there are those use cases that just quite frankly, if we don't bring accelerated computing and AI to the table we probably will never get to solve some of these really sticky city problems. And so my job is really bringing these two things together, the use cases and the world-class AI compute and see where we can really accelerate solutions for cities.

Call to Action

Adel Nehme: That is very exciting. And finally, Amen, before we let you go, do you have any call to actions for listeners?

Amen Ra Mashariki: A couple of things. The first one is, when I grew up I grew up with this term, each one, teach one. I love YouTube and going on YouTube and you see a lot of data scientists teaching on YouTube and sharing insights and best practices and tactics on how to do data science. I would say we need more of that. We need more of that in terms of what we talked about, Adel, in terms of training up city personnel and public sector personnel. So where you can, share best practices and capabilities that are specifically focused on growing the capability within city government, that would be amazing.

Amen Ra Mashariki: And then I think the other is share solutions. Folks who are listening, who were students, undergrad or graduate, or who are just students of life, who are growing their GitHub platforms with projects and initiatives share those and make those more public so that we can reuse those where needed as well.

Adel Nehme: All right. Thank you so much for the insights, Amen. It was a pleasure to have you on the show.

Amen Ra Mashariki: It was my pleasure. This was fun. I expected it to be fun, it was absolutely fun. I really appreciated, Adel. Thank you for having me.

Adel Nehme: That's it for today's episode of DataFramed. Thanks for being with us. I really enjoyed this conversation with Amen, and how he thinks about the use of data science in government. It will be fascinating to follow how data science can help governments become more reactive to crises and how it helps them become more proactive to achieve their strategic goals.

Adel Nehme: If you enjoyed this episode, make sure to leave a review on iTunes. Our next episode will be with Dan Becker, CEO of decision.ai. In it, we will talk about the intersection of decision sciences and AI, and how to align machine learning to business value. I hope it will be useful for you and we hope to catch you next time on DataFramed.