The concept of data-driven organizations has been a staple of the technology industry. Tech giants like Amazon, Google, and Netflix operate with data at the core of their business model, and they epitomize how data can be leveraged to drive decision-making towards successful business growth.
With growing data availability and significant decreases in costs for data processing and storage, opportunities to harness data to solve business challenges have extended beyond the technology industry. Finance, healthcare, and insurance companies have recognized this and have embarked on data transformation journeys to establish competitive business advantages.
While leveraging cutting edge-machine learning techniques and bridging the data science talent gap remains top of mind, an equally crucial aspect of a data strategy is the creation of a data governance plan. This article aims to dive into the definition of data governance, as well as the tools and technology that can drive successful governance programs.
What is Data Governance?
In a recent webinar, Aaren Stubberfield, Data Scientist at Microsoft, outlined the opportunities and best practices of data governance, where he started by addressing the fundamental question of what data governance is. Data governance, as defined by the Data Management Association (DAMA), is the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.
(1) Planning: Creating rules that datasets need to conform to (e.g. format standardization of date fields e.g. YYYY-MM-DD)
(2) Monitoring: Measuring compliance to rules set in place in the Planning phase (e.g. tracking the percentage of ‘Date’ fields with missing values)
(3) Enforcement: Remediation actions in the event of data rule breaches based on level of urgency e.g. immediate corrective action if critical data fields such as social security number are found to contain major discrepancies
What does a Data Governance Framework look like?
There are different types of data governance frameworks out there, but they tend to have these four common elements: Data Policies and Procedures, Datasets, Data Stewards and Stakeholders, and Technology.
Data stewards are the people who develop governance policies and procedures for the datasets in the organization. Data governance tools are vital enablers for the stewards to do their best work, and they form a key part of the governance framework.
What are the Tools and Technology used for Data Governance?
Data governance tools are necessary to automate governance operations and data stewardship efforts since the large volumes of data render it impossible for these activities to be done manually. They are also able to integrate with different IT products and datasets along with comprehensive data cataloging, thereby extending their capabilities across an organization’s entire data management system. To meet the burgeoning demand for these tools, there has been a proliferation of data quality solutions in the market. A review by Gartner analyzed the data governance market and came up with Magic Quadrants to classify vendors offering data governance solutions.
Given the growing importance of data governance, numerous data management tools offer governance capabilities as part of their systems. SolutionsReview released the Data Management Vendor Map, where they classified data management solutions into three categories: Data Quality Tools, Master Data Management, and Data Management for Analytics.
(1) Data Quality Tools
Data quality is defined as the overall utility of the data and its ability to be easily processed and analyzed for other purposes. In order to achieve and sustain good data quality, the governance tools need to perform key functions such as standardizing, profiling, parsing, cleansing, and monitoring. Besides keeping the data clean and well-organized, these tools are also expected to support data processing across the organization’s entire data system.
Talend’s flagship product is the Talend Data Fabric, which includes its data integrity and governance capabilities as part of a single unified cloud platform for trusted data. This includes metadata management to data lineage, as well as collaborative data stewardship solutions.
The SAS Data Quality product provides data quality management capabilities across different kinds of databases and data architecture deployments. It includes the essential functions of data cleansing, entity resolution, as well as a unified web-based console to monitor data quality jobs.
Informatica offers an expansive portfolio of data tools in various deployments. Their Informatica Data Quality product provides users with a rich set of data transformation capabilities, while also allowing them to build and review business rules without relying on IT.
(2) Master Data Management
Master data refers to the consistent and uniform collection of core company-wide data points and comprises key components such as customers, leads, suppliers, employees, accounts, and more. Underpinning an effective data quality control strategy is the setup of solid master data management. It is only then that the enterprise’s growing data assets (along with its metadata) can be properly structured and consolidated in a central repository to support optimal decision making.
The Ataccama Platform product serves as an augmented data management platform that includes MDM as one of its modules. It also includes other relevant modules like data quality and metadata management, and is fully integrated for different types of deployment.
The EnterWorks Platform is a multi-domain platform that creates a central repository of reliable, up-to-date master data consolidated across all enterprise applications with powerful tools to improve data quality and governance. Furthermore, all administrative and governance functions across the multiple domains can be achieved from a single user interface.
Riversand’s Master Data Experience Management platform provides a multi-domain cloud-native, unified software-as-a-service (SaaS) platform for all MDM use cases. It aims to eliminate data silos and support compliance with data governance rules by creating a single, accurate, trustworthy source of master data, along with comprehensive views of business-critical data.
(3) Data Management for Analytics
With the multitude of governance activities involved in maintaining data quality, enterprises may want to look for comprehensive integrated data platforms instead of standalone solutions. Data management for analytics solutions are comprehensive systems that integrate with analytics software to oversee data analytics such as relational and non-relational analytical processing, business intelligence, and machine learning.
(i) Collibra The Collibra Data Governance product is a cloud-based platform that helps enterprises establish a common data asset understanding and collaborate in a central location. This includes a suite of services such as policy manager, reference data and business glossaries. All these help to generate trusted data for powerful business analytics to be built upon.
(ii) erwin The erwin Data Intelligence Suite (erwin DI) serves as a unified software platform that lets users create automated and curated enterprise data catalogs complete with data models and on-demand lineage. This drives agile and well governed data preparation and analytics, with integrated business glossaries for organization-wide data literacy.
(iii) Alation The Alation Platform empowers analysts and business users with an open and intelligent platform that supports a wide variety of metadata management applications to data catalog and governance. Furthermore, its integrated Analytics product serves as a one-stop shop for productive self-service analytics.
How to maximize the value of these tools?
It is important to remember that tools are just one lever in the strategy towards data democratization. To make full use of these data governance tools, the people in the organization are key. This is because good data governance is a team effort that requires humans to design and implement the agenda and policies and then leverage technology to automate and monitor these procedures optimally. To achieve this, organizations need to start focusing on data literacy upskilling so that everyone has the skills to work with data to do their best work.
Learn more about DataCamp for Business:
DataCamp for Business provides an interactive learning platform for companies that need to upskill and reskill their people on data skills—on everything from data literacy and data science to data engineering and machine learning. Join 1,600+ companies and 80% of the Fortune 1000 who use DataCamp to upskill their teams.