Official Blog
learning data science
+5

Scaling Data Science With Data Governance

The immense potential of data science and analytics is well recognized by businesses across all industries. But for these data science initiatives to succeed and scale, the data must first be relevant, accessible, and of high quality. This is where data governance comes in to set the right foundations for scalable data science.

What is Data Governance?

In a recent webinar, Aaren Stubberfield, Data Scientist at Microsoft, described how data governance enables scalable data science, where he started by addressing the basic question of what data governance is.

Data governance, as defined by the Data Management Association (DAMA), is the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.



(1) Planning: Creating rules that datasets need to conform to (e.g. format standardization of date fields e.g. YYYY-MM-DD)
(2) Monitoring: Measuring compliance to rules defined in the Planning phase (e.g. tracking the percentage of ‘Date’ fields with missing values)
(3) Enforcement: Remediation actions in the event of data rule breaches (e.g. immediate corrective action if critical data fields such as social security number are found to contain major discrepancies)

While there are different data governance frameworks available, they all revolve around the same key elements of people (i.e. data stewards and stakeholders overseeing data governance programs), processes (i.e. data policies and procedures that make up the rules datasets need to conform to), the datasets in question, and tools (i.e. technology and infrastructure that allow the monitoring and enforcement of policies).



Why is Data Governance Important?

For data to be useful as a strategic asset, it must be kept compliant, clean, and accessible through formal procedures and accountability. The purpose of data governance is to establish the methods, processes, and responsibilities that enable organizations to ensure the quality, compliance, and usability of their data.

Data quality is crucial for the democratization of data science because organization-wide trust in data is of utmost importance. Just as consumers would not purchase or use products from retailers they do not trust, business stakeholders and data practitioners will be hesitant in using data (or insights gleaned from data) for decision-making if they lack trust in it.

Ultimately, the most successful businesses are the ones who can effectively manage data as a trusted asset and are able to marry the best data science capabilities with the best data quality.

What does Good Data Quality look like?

Data governance programs ensure the scalability and maintenance of data quality but what exactly does good data quality look like? Atlan nicely summarized it with the following seven characteristics:

  1. Accurate
  2. Available and accessible
  3. Complete
  4. Relevant and reliable
  5. Timely
  6. Gives the right degree of granularity
  7. Helps you in decision-making

What are the benefits of data governance?

When properly implemented, a solid data governance framework can help companies reap numerous significant benefits:

  • Common Data Language - A common verbiage around metrics and datasets creates a common data language and consistent terminology that is easily used and understood across the organization.

  • Increased confidence in data quality - Improved data quality from good data governance promotes trust and confidence in the data and its corresponding documentation, thereby leading to greater utilization of the data while making decisions.

  • Improved reusability of data - Data governance creates standardization and mapping of enterprise-wide data. This greatly enhances efficiency through the reuse of data across different business units and varying use cases within the enterprise.

  • Better data management mechanisms - Central control mechanisms establish clear policies and rules that guide best practices and codes of conduct around data usage. This provides a bird’s eye view for the consistent management of key areas like compliance and security, while also minimizing costs in overall data management.

  • Scalable compliance - A strong data governance program ensures that organizations remain compliant with regulatory pressure (e.g. California Consumer Privacy Act (CCPA) and General Data Protection Regulation (GDPR)) around how to properly manage the privacy and confidentiality of personal data.

  • Single Source of Truth - Data governance facilitates the integration of different data sources such that data is no longer siloed. This establishes a consistent and complete single source of truth of the customer profiles (or profiles of other key entities) that business stakeholders can agree upon to make better cooperative decisions.

  • Increased Agility from Data Democratization - The high quality of data created by good governance makes it readily available even for non-technical units to utilize for their business goals. This democratization of data will then enable business units to be more agile through self-service analytics.

All these benefits will in turn enable companies to reduce costs through improved efficiency, minimize risks through prudent use of data, stay compliant to data regulations, and maximize business performance through consistent use of data-driven decision making.



How to Operationalize Data Governance?

A recent KPMG study mentioned that only 35% of data leaders trust their data, while a Gartner survey revealed that a staggering 42% of data leaders do not assess or monitor their companies’ data governance structures.

This shows that many organizations are still behind in their data quality journeys. In the webinar, Aaren mentioned numerous best practices for organizations to get started on their path towards operationalizing data governance.

  • Start with the right people - Successful data governance is built on the agenda set by the right people. It is important to start by enrolling the best talents and key stakeholders to form the core team that drives data governance implementation.

  • Track your metrics - Data quality metrics need to be identified and monitored right from the start to track the progress and value of data governance systems. Some examples include percentage completion of data glossary, and percentage of contact details that are complete, valid, and accurate.

  • Communication is key - Continuous support and sponsorship of data governance programs rely on frequent and constructive communication of the successes, setbacks, and progress to the key business stakeholders.

  • Be in it for the long haul - Implementing a data governance program is a long-term investment that requires consistent work. This means that the operationalization of data governance should be framed as an iterative process towards continuous improvement, rather than an ad-hoc initiative.

  • Invest in data governance tools - The ever-expanding volumes of data render it impossible for data governance activities to be done manually. This is where data governance tools come in to serve as vital enablers in the automation of governance operations and data stewardship efforts.

  • Avoid starting from scratch - There are many resources documenting successful use cases, frameworks, and examples of effective data governance, so there is no need to start from scratch. For example, DAMA, EDM Council, and PwC have frameworks that can be leveraged and repurposed by organizations.



Example of a comprehensive data governance framework | Source: PwC

Building the Business Case

In order to drive long-term change and gain executive buy-in on data governance initiatives, it is important to present a strong business case to stakeholders to maintain strong executive support. This can be done by first identifying pain points where low data quality is the problem. For example, incomplete customer contact details could be resulting in a significant loss of sales opportunities.

From there, business benefits (e.g. increased revenue, improved customer experience etc.) from improved data quality resulting from data governance endeavors can be clearly highlighted and communicated.


Data skills enable successful data governance

A BCG study showed that more than 60% of the companies assessed their data governance capabilities as ‘underdeveloped’, leading to an inability to leverage advanced analytics in their business. This shows that there are many challenges to the successful implementation of data governance, including inadequate data steward representation, misaligned ownership of data governance elements, and low organizational priority.

A common theme in these challenges is that people in the organization are at the center of making it happen. To achieve the data governance outcomes, the people in the organization must be empowered with the right skills to handle the governance tasks involved. To enable successful data governance, organizations should look to upskill their people on a myriad of data skills, from data literacy, to data cleaning and exploration, and more.

Ultimately, data governance is a team effort built on a common data language across the enterprise.