One of the core tasks in Master Data Management is to have control over your data in the right quality. Data quality can be measured in many different ways.
Many companies have gone through several IT implementations over time, and as more and more systems have been added, they have lost track of where and how data is located in the various systems. This means that they are struggling with poor data quality and do not have the right tools to create an overview - or they lack the right people in the organization to continuously combat the bad data.
It is a growing concern for most companies' data teams because quality is key to building trust in the data and analysis available to the business.
Ensuring data quality is an ongoing effort and requires that those responsible have access to KPIs and overviews that help prevent and maintain data. Data quality can be measured in many ways; here are some of the most commonly used:
The accuracy score reflects the data content, and rules are often established to ensure name standards, formats, etc. An example of a rule is: "Person names may only contain letters and may have a few special characters."
The completeness score shows where data is missing. Some master data is essential for the company's internal processes to function, making it a good idea to set up rules and measure where data is missing. An example of a rule is: "Person addresses must contain street name, street number, postal code, city, country, and must not be blank or have NULL values."
The consistency score measures whether the data is coherent. These relationships also contribute to higher completeness if the rules are set up to each other. An example of a rule is: "When a person´s address is created, and the postal code is filled in, the city and country MUST also be filled in."
The integrity score measures the credibility and consistency of data throughout its lifecycle and aims to shed light on unintended changes, etc. An example of a rule is: "Person information must be the same in the HR system and payroll system at all times."
The actuality score measures the availability of data. Availability is often determined based on the needs of decision-makers and helps ensure the right information at the right time. An example of a rule is: "A person's vacation home address must be created at the same time as their home address, and any changes must be updated within the same workday as the change is received."
In addition to data-focused quality measurements, it is also important to focus on operational measurements, such as measuring new creations, the number of corrections, the number of duplicates, and the data owner's ability to clean up and purge bad data. These KPIs also provide insight into the progress of improving data quality. Are we as an organization improving or deteriorating over time? Are there specific times of the year that are more challenging than others?
Data decays at a rate of 70% per year
Employees waste 50% of their time searching for data, correcting errors, and answering questions about the source of uncertainty in data
(Source: Harvard Business Review)
$9,7 million is the average cost for companies per year due to poor quality data.(Source: Gartner)
What are the typical challenges?
Poor data can have significant business consequences for a company. Data of poor quality is often the source of operational problems, inaccurate analyses, and incomplete business strategies.
Examples of the financial damage that data quality problems can cause includes extra expenses when products are shipped to incorrect customer addresses lost sales opportunities due to incorrect or incomplete customer information, and fines for incorrect reporting of financial or regulatory information.
Here is a few derived examples from everyday life:
A company has an incorrect delivery address on the order. This means that the goods are sent to the wrong location, and the recipient either keeps the goods or rejects them - resulting in extra shipping costs, wasted time handling customer inquiries and processing return orders, etc.
A company has incorrect product descriptions. This means that the wrong products are ordered for the store or there are missing products in the warehouse, resulting in missing targeted sales in the store as well as delays, errors, and deficiencies in production.
A company has incorrect bank details for employees. This means that the payroll payment to employees is rejected or paid to the wrong recipient, resulting in extra time and money for recovery and re-payment.
A company has created the same customer multiple times with different or nearly identical information on names, addresses, and phone numbers for each customer number. This means that the customer is not contacted at all, resulting in lost sales potential, or that the same customer is contacted multiple times (by phone or mail), resulting in a poor customer experience, which may eventually make customers lose patience and the company loses the customer.
Data Quality Management Framework - Organization is the key to sustained success
To ensure continuous focus and create business value, it is important that the work of improving data quality does not stop at "the first MDM project." It requires sustained focus from both management and employees in everyday work. For this purpose, we recommend building a Data Quality Management Framework, and it doesn't have to be a heavy, complicated process; it can be done quite simply, and here are some of our recommendations:
A simple 4-step process
Step 1 - Vision and objectives
Get an overview of where you want to go (vision), so you have something to steer towards and identify ownership.
- Which data quality metrics do we want to measure?
- What is the goal for each measurement point?
- Do some measurement points weigh more than others?
- Should specific business logic be taken into account?
- What are the rules for each measurement point?
- Who are the stakeholders for the measurement point in the organization?
- Who is the data owner?
- Does the data owner have a Data Steward who helps in everyday work?
- Which critical business processes are affected by the measurement point?
Step 2 - Evaluation as-is
Get an understanding of where you are now, so you know how far you are from the target and can work on specific objectives. In this step, each data object you are working with is broken down into specific attributes so that you know how each one affects the overall goal. As-is is also your baseline for tracking over time and following progress.
Step 3 - Fit/Gap analysis
The Fit/Gap analysis gives you insight into where and how you can improve individual metrics. It is important to understand the needs of the business and how the data that goes into each metric affects the business processes. Remember that not everyone needs the same level of accuracy, for example, some stakeholders can live with the fact that customer addresses are not 100% accurate as long as they are accurate enough for suppliers to find the address, while other stakeholders may need the address to be correct to perform a specific excavation or similar tasks. In most cases, the need is fairly consistent throughout the business, but the point here is that it is important to understand any differences.
This applies to all metrics, and another example is the different needs for the completeness of data. Some stakeholders need values in all master data around customers (credit and finance departments), while others may only need the essential information such as name, address, phone, and email of the customers for their business process. We recommend that you classify your data based on criticality and start by focusing on the most critical first, as not everything can or necessarily should be solved at once.
Step 4 - Implementation
Once you have identified your shortcomings about the target picture, it's time to establish some good rules for each quality measurement. The rules should partly help to specify exactly what needs to be measured (see some of the examples from the previous section) and partly form the basis for your actions around data. You will get the best results if your technical solution supports the setup of the rules so that you have as little manual work as possible. There are several publicly available standards that can be used and integrated into your internal systems. This data can be used for both normalization and enrichment. (See more about data enrichment in the blog post Golden Record).
In addition, it is important to set up relevant reporting for measurement and overview, which also helps ensure that you can constantly monitor where improvement efforts need to be focused and can be used for control/effectiveness measurement of the efforts.
The organizational implementation involves getting your data owners to take responsibility and delegate the practical day-to-day work to one or more Data Stewards. Your organization, roles, and responsibilities are essential to succeeding in sustained focus and quality assurance of data. This is a topic in itself, which we will delve more into in one of the next releases of this blog post series.
How can quality reporting help?
To support your data quality framework, you need reporting that provides insight into any quality issues.
The reporting can be set up to track individual data updates and recalculate related quality measurements (KPIs), as well as be presented with a trend over time so that you constantly have an updated picture of the current status and where you are heading.
The reporting can serve several purposes, and the measurement of data quality can be viewed in different ways:
Ensures data testing and validation throughout the ETL flow
Ensures logging and error handling
Validation of data sets, comparison of, for example, cube-DAX result sets with master data sets
Contributes to quality measurements such as completeness, reliability, timeliness, and accessibility.
Measures data quality per data object
Validates datasets, and compares attributes across systems and datasets
Ensures a focus on a golden record
Supports quality measurements such as uniqueness, completeness, validity/integrity, accuracy, and consistency.
Consumption and performance
Ensures monitoring of response time and performance of queries, reports, etc.
Ensures error handling of pipelines, ETL jobs, etc.
- Tracks usage patterns
Contributes to quality measurements such as reliability, actuality, availability, and data stewardship/responsibility.
Regardless of the purpose, it is important that your reporting can visualize the status at multiple levels (aggregated, summarized, detailed) and support you in answering business-specific questions about poor data and their effect on work processes and the business.
If you are interested in learning more about how to ensure that data meets expected quality standards - and visualize it in Power BI - you can participate in our course on data quality management.