Skip to main content
HomeBlogData Science

A Survey Into Data Governance Tools

In a recent webinar, we discussed how data governance is a key component of an organization’s data strategy and enables it to harness the full value of data. For a data governance plan to succeed, it is important that the right tools and technology are employed.
Jun 11, 2021  · 8 min read

The concept of data-driven organizations has been a staple of the technology industry. Tech giants like Amazon, Google, and Netflix operate with data at the core of their business model, and they epitomize how data can be leveraged to drive decision-making towards successful business growth.

With growing data availability and significant decreases in costs for data processing and storage, opportunities to harness data to solve business challenges have extended beyond the technology industry. Finance, healthcare, and insurance companies have recognized this and have embarked on data transformation journeys to establish competitive business advantages.

While leveraging cutting edge-machine learning techniques and bridging the data science talent gap remains top of mind, an equally crucial aspect of a data strategy is the creation of a data governance plan. This article aims to dive into the definition of data governance, as well as the tools and technology that can drive successful governance programs.

Training 2 or more people? Check out our Business solutions

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

Try DataCamp for Business
homepage-hero.png

What is Data Governance?

In a recent webinar, Aaren Stubberfield, Data Scientist at Microsoft, outlined the opportunities and best practices of data governance, where he started by addressing the fundamental question of what data governance is. Data governance, as defined by the Data Management Association (DAMA), is the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.

(1) Planning: Creating rules that datasets need to conform to (e.g. format standardization of date fields e.g. YYYY-MM-DD)

(2) Monitoring: Measuring compliance to rules set in place in the Planning phase (e.g. tracking the percentage of ‘Date’ fields with missing values)

(3) Enforcement: Remediation actions in the event of data rule breaches based on level of urgency e.g. immediate corrective action if critical data fields such as social security number are found to contain major discrepancies

What does a Data Governance Framework look like?

There are different types of data governance frameworks out there, but they tend to have these four common elements: Data Policies and Procedures, Datasets, Data Stewards and Stakeholders, and Technology.

Data stewards are the people who develop governance policies and procedures for the datasets in the organization. Data governance tools are vital enablers for the stewards to do their best work, and they form a key part of the governance framework.

What are the Tools and Technology used for Data Governance?

Data governance tools are necessary to automate governance operations and data stewardship efforts since the large volumes of data render it impossible for these activities to be done manually. They are also able to integrate with different IT products and datasets along with comprehensive data cataloging, thereby extending their capabilities across an organization’s entire data management system. To meet the burgeoning demand for these tools, there has been a proliferation of data quality solutions in the market. A review by Gartner analyzed the data governance market and came up with Magic Quadrants to classify vendors offering data governance solutions.

Given the growing importance of data governance, numerous data management tools offer governance capabilities as part of their systems. SolutionsReview released the Data Management Vendor Map, where they classified data management solutions into three categories: Data Quality Tools, Master Data Management, and Data Management for Analytics.

(1) Data Quality Tools

Data quality is defined as the overall utility of the data and its ability to be easily processed and analyzed for other purposes. In order to achieve and sustain good data quality, the governance tools need to perform key functions such as standardizing, profiling, parsing, cleansing, and monitoring. Besides keeping the data clean and well-organized, these tools are also expected to support data processing across the organization’s entire data system.

Examples:
(i) Talend
Talend’s flagship product is the Talend Data Fabric, which includes its data integrity and governance capabilities as part of a single unified cloud platform for trusted data. This includes metadata management to data lineage, as well as collaborative data stewardship solutions.

(ii) SAS
The SAS Data Quality product provides data quality management capabilities across different kinds of databases and data architecture deployments. It includes the essential functions of data cleansing, entity resolution, as well as a unified web-based console to monitor data quality jobs.

(iii) Informatica
Informatica offers an expansive portfolio of data tools in various deployments. Their Informatica Data Quality product provides users with a rich set of data transformation capabilities, while also allowing them to build and review business rules without relying on IT.

(2) Master Data Management

Master data refers to the consistent and uniform collection of core company-wide data points and comprises key components such as customers, leads, suppliers, employees, accounts, and more. Underpinning an effective data quality control strategy is the setup of solid master data management. It is only then that the enterprise’s growing data assets (along with its metadata) can be properly structured and consolidated in a central repository to support optimal decision making.

Examples
(i) Ataccama
The Ataccama Platform product serves as an augmented data management platform that includes MDM as one of its modules. It also includes other relevant modules like data quality and metadata management, and is fully integrated for different types of deployment.

(ii) Enterworks
The EnterWorks Platform is a multi-domain platform that creates a central repository of reliable, up-to-date master data consolidated across all enterprise applications with powerful tools to improve data quality and governance. Furthermore, all administrative and governance functions across the multiple domains can be achieved from a single user interface.

(iii) Riversand
Riversand’s Master Data Experience Management platform provides a multi-domain cloud-native, unified software-as-a-service (SaaS) platform for all MDM use cases. It aims to eliminate data silos and support compliance with data governance rules by creating a single, accurate, trustworthy source of master data, along with comprehensive views of business-critical data.

(3) Data Management for Analytics

With the multitude of governance activities involved in maintaining data quality, enterprises may want to look for comprehensive integrated data platforms instead of standalone solutions. Data management for analytics solutions are comprehensive systems that integrate with analytics software to oversee data analytics such as relational and non-relational analytical processing, business intelligence, and machine learning.

Examples
(i) Collibra The Collibra Data Governance product is a cloud-based platform that helps enterprises establish a common data asset understanding and collaborate in a central location. This includes a suite of services such as policy manager, reference data and business glossaries. All these help to generate trusted data for powerful business analytics to be built upon.

(ii) erwin The erwin Data Intelligence Suite (erwin DI) serves as a unified software platform that lets users create automated and curated enterprise data catalogs complete with data models and on-demand lineage. This drives agile and well governed data preparation and analytics, with integrated business glossaries for organization-wide data literacy.

(iii) Alation The Alation Platform empowers analysts and business users with an open and intelligent platform that supports a wide variety of metadata management applications to data catalog and governance. Furthermore, its integrated Analytics product serves as a one-stop shop for productive self-service analytics.

How to maximize the value of these tools?

It is important to remember that tools are just one lever in the strategy towards data democratization. To make full use of these data governance tools, the people in the organization are key. This is because good data governance is a team effort that requires humans to design and implement the agenda and policies and then leverage technology to automate and monitor these procedures optimally. To achieve this, organizations need to start focusing on data literacy upskilling so that everyone has the skills to work with data to do their best work.

Learn more about DataCamp for Business:
DataCamp for Business provides an interactive learning platform for companies that need to upskill and reskill their people on data skills—on everything from data literacy and data science to data engineering and machine learning. Join 1,600+ companies and 80% of the Fortune 1000 who use DataCamp to upskill their teams.

Topics
Related

blog

Scaling Data Science With Data Governance

The immense potential of data science and analytics is well recognized by businesses across all industries. But for these data science initiatives to succeed and scale, the data must first be relevant, accessible, and of high quality. This is where data governance tools come in to serve as vital enablers in the automation of governance operations and data stewardship efforts.
Kenneth Leung's photo

Kenneth Leung

8 min

blog

What is Data Governance? Elevate Your Business Strategy and Decision-Making

Enhance your organization’s decision-making through a data-driven strategy, transforming siloed Information into strategic assets.
Vinita Silaparasetty's photo

Vinita Silaparasetty

15 min

blog

Democratizing Data in Government Agencies

Government agencies have access to troves of data. In order to fully harness its value, there needs to be a committed and coordinated effort towards data democratization within and across government agencies.
Kenneth Leung's photo

Kenneth Leung

8 min

blog

How Data Discovery Tools Enable Data Democratization

Understand how top-performing companies solve data discovery challenges to drive data fluency at scale.

Kevin Babitz

7 min

podcast

Making Data Governance Fun with Tiankai Feng, Data Strategy & Data Governance Lead at ThoughtWorks

Adel and Tiankai explore the importance of data governance in data-driven organizations, how to define success criteria and measure the ROI of governance initiatives, non-invasive and creative approaches to data governance and much more.
Adel Nehme's photo

Adel Nehme

39 min

cheat-sheet

Data Governance Fundamentals Cheat Sheet

Master the fundamentals of data governance with our Data Governance Fundamentals Cheat Sheet. Quickly reference key concepts, best practices, and key components of a data governance program.
Richie Cotton's photo

Richie Cotton

1 min

See MoreSee More