Master Data Management: Golden Record

A Golden Record is the ultimate goal in the data world. It is a fundamental concept within Master Data Management, and it is defined as being the only source of truth – a data point that captures all the necessary information that we need, e.g. on a customer, an employee or other data areas in our data catalogue.

The goal of a Golden Record is:

That only one version of your master data should exist
That it should contain a complete description that covers all purposes of using data in the company
That it should contain the most current and accurate data values

6 disadvantages of not having a Golden Record

If you do not possess a Golden Record for your most valuable data, it can turn out to be very expensive with respect to turnover and customer satisfaction. Below, we will review 6 challenges that you may encounter if you do not have a Golden Record.

1) No big picture

One of the most prominent disadvantages of not having a Golden Record can be found in your customer data. For example, when there are duplicates of a customer – with several interactions registered for different entries – one of the challenges will be to identify unresolved actions.

Duplicates often occur because different departments have different information concerning a data entry. Departments like sales, finance, logistics and marketing can all create their own data set which shows specific characteristics of a customer. For instance, the finance department knows how and when the customer paid for a product, the sales department knows what a customer is interested in, logistics knows when a product was sent out and marketing knows that a customer used a discount coupon for a product following a specific email campaign. For the individual department, this information may seem adequate – but it often isn’t. The different data sets will result in departments working in different directions. The lack of an identical data set can, in the end, lead to business problems.

2) Inefficiency

Anther challenge you may encounter involves inefficiency. In today’s interactive business world, we collect customer data from a long list of data sources, and when all touch points with the customer are not gathered in one entry, it creates confusion and makes it difficult for the users to work efficiently with the data. Among other things, duplicate data leads to doubt concerning which data is most correct. This results in a distrust of data and systems as well as irritation on the part of the data users.

Finding the right data is time consuming, and in the end, it lowers productivity at the company. Once poor quality data has entered the system, it takes a lot of work to neutralise its negative effects.

3) Reduced use of systems

Trust in data plays a major role in how your employees use the business systems. Your CRM and other data management systems may be the best, with user-friendly and intuitive functionality, but if the system is full of duplicate data, the users will quickly realise this and it may lead to further frustration and employee inefficiency. Therefore, employees will seek alternative solutions – such as storing data in Excel – so that other departments cannot alter it. On the other hand, the other departments will not be able to take advantage of the insights of this data.

4) Negative impact on the company’s reputation

By neglecting the health of your data, you are neglecting the health of your company. Customers are essential parts of business. By improving the quality of the data they receive, you maximise the effectiveness of your communication and develop your reputation in the market.

If your data includes duplicate entries, perhaps with different data values, you risk that the various departments communicate with the same customer through different channels – possibly on the basis of different data. This can make your company seem unprofessional.

5) Lost sales opportunities

Being able to use data effectively with respect to sales is more important than ever. If marketing activities, sales contracts, licences and contracts are registered in different data entries, then the chance of seeing the big picture, tracking sales opportunities and building a good sales strategy is very small.

For example, it is far easier to identify the opportunities of cross-selling and up-selling on the basis of complete data, which also makes it easier to establish a lasting customer relationship.

6) Incorrect reports and uninformed decisions

If you plan to use your data to make informed decisions and predict what you should do to ensure future growth, make sure that your data is accurate, complete and without duplicate entries. The decisions based on poor quality data are not much better than those made off a gut feeling.

If you discover that data in a report is deficient or incorrect, you will often look for quick shortcuts to “patch up” the data, which puts pressure on Data Stewards, who need to remedy the poor data quality. This way, responsibility is placed on a few people, rather than having a long-term solution that involves the whole company and its use of data.

47% of all newly-established data entries contain at least one critical error that affects work

Source: Harvard Business Review

How do you create a Golden Record?

Companies today are swamped with large quantities of poor quality data. You have to manage this so as to avoid the risk of negatively impacting your turnover and credibility. Establishing Golden Records for your data is not easy – otherwise every company would have one. To get a Golden Record, data must be matched, cleaned and consolidated. Without using a Master Data platform, this is an endless task because, once your data has finally been cleaned, it will already start becoming out of date. If you use a more technical approach to cleaning, one challenge is that the system often becomes too rigid and does not allow for flexibility with respect to how data entries should be combined.

A smarter way to perform these tasks is to use a Master Data Management platform like CluedIn. CluedIn is designed as a more dynamic approach, which somewhat turns the Golden Record concept on its head. Instead of determining which data and source is most correct and then constructing an algorithm, CluedIn uses a more statistical and automatic approach.

First and foremost, it is about matching data. As mentioned in our blog on data integration and modelling, you choose one or more unique references for your data. The references use CluedIn to find other data with the same unique references so that the data entries can be combined. For example, if you chose the CVR (company registration) number as a unique reference for business customers and a certain CVR number appears 6 times in your data (typically across sources) – then it is a match, and they can be consolidated into one customer rather than 6 individual customers. And this is how the journey towards a Golden Record begins.

Automatically merge, choose yourself or do both

Duplicates often have different data attributes – either they are input incorrectly at different times or they have not been updated with newer data. When data entries are combined, considerations must therefore be made as to which attributes are the most correct, i.e. the “winning” attributes, which will be part of your Global Record. The starting point is that CluedIn compares data across sources and considers 3 factors: the most recent date of creation or update, “trust levels” for the individual sources or attributes and, finally, the data’s accuracy measurement. However, it is rare that it gets this far since one of the previous factors usually exists. If you are nonetheless not satisfied with the “winning attribute” chosen by CluedIn, you have the option to make corrections later.

Fuzzy merging

However, not all data entries have the same reference keys – or perhaps they entirely lack the unique references you have chosen. The next step is therefore to use fuzzy merging to reduce the quantity of manual work. Fuzzy merging is a merging of data that is nearly identical – i.e. the values are very alike but not 100% identical. This could include different spellings, spelling errors or different formats. Using fuzzy merging on selected fields where it is likely to be a match allows you to localise additional possible duplicate data entries. An example of a match that is not 100% but where fuzzy matching will likely find a near match is casper.elkjaer@initech.com and cassper.elkjaer@initech.com. These are largely identical, but without being 100% so.

Some of the data fields are free text fields which can contain variations of data, so it could make sense to use fuzzy merging logic here. Among other things, you could look for names that are not quite identical – for example, one data entry could include a middle name or the name could be spelled incorrectly in another entry. It’s also possible that the telephone number appears with the country code in one place and without it in another place. You can choose the specific percentage match. When CluedIn finds a match for the fuzzy references you have selected, it combines the duplicate entries, and you are one step closer to your Golden Record.

Fuzzy merging is a good supplement to tracking down duplicates, but it is important that the rules and the data content are continuously evaluated. Otherwise, you could easily start merging data that should not be merged. An example of an attribute where excessively aggressive fuzzy matching can take place is first names. Two first names can be so identical that the algorithm may regard them as being nearly identical if not considered in relation to other data attributes in your golden record. For example, if Tim and Tom appear alone in a fuzzy match, it is likely that they will be merged and considered as one. Some may quickly conclude that they are not the same, but the algorithm is no smarter than we make it, so therefore, it is important to have as many attributes in play as possible in a fuzzy match.

Duplicate lists

Following a unique match on reference keys and a given fuzzy match, you may still have a set of duplicates that should be handled manually – possibly without your knowledge. Therefore, lists of possible duplicate entries are a good help in identifying these. It’s about identifying individual attributes and possible connections in data which you think can be identical data sets. The selected attributes and rules from the fuzzy logic can be a good starting point, particularly if you have chosen a less aggressive approach. You may have left out names in your fuzzy logic entirely; however, as an example, if the same customer appears in the system several times, it would be nice to have a list of the possible duplicates. This does not mean that all customers with the same name ARE duplicates, but it does provide an overview that you can work with further. As standard, CluedIn offers a duplicate list of names, however you can compose additional data queries to meet your needs. For example, name + address or name + town + country code. Based on the list of duplicates, you can manually combine the entries that you deem to be a match. The goal here is to end up with as little manual merging as possible, but if you are not sure which fuzzy matching rules best fit your data, you could make several different lists of duplicates and test the manual merging first.

Cleaning and enriching data

One way to increase the probability of being able to identify and merge duplicates with fuzzy matching is to clean the data for errors and deficiencies as well as to enrich it with additional attributes. Errors could be spelling mistakes and different formats that result in the data not matching. Deficiencies could be a middle name or a post code on addresses etc. With the CluedIn Clean tool, you can easily identify and clean your data of errors and deficiencies so that the values are identical and are identified by the fuzzy logic and merged with the now more accurate attributes.

Deficient data can also make it a challenge to match data, and this is particularly the case if it is the unique reference keys that are missing for the data entries. Missing data can either be input into CluedIn clean or retrieved from external sources, websites, public data banks, etc. You can either retrieve data using a standard connector to, for example, the Central Business Register, Dawa or the Central National Register or one that is adapted to the third-party supplier you need.

CluedIn also offers good insight into the quality of data. In the next blog post on data quality measurements, we will return to how you can gain this overview.

The illustration below shows the merge steps that data goes through in CluedIn so as to arrive at a Golden Record.