The optimal scenario is that updates between systems should be automated – with as little manual work as possible in order to avoid typos, etc. However, this imposes requirements as to the ongoing evaluation and update of these IT integrations in order to ensure that they always match and fit the business processes.
Typical challenges with data integration
When there is no overview of which data is most up to date and trustworthy, establishing an overview of your existing and potential customers and their interaction with your company can be very challenging.
Another challenge is that data is normally very different and sometimes found in so-called “data silos”, where data is only accessible to certain departments or persons. Any person or department that needs access to the whole data set – e.g. for establishing targeted sales initiatives and marketing campaigns – will quickly run aground in their attempt to gain access to the data set so as to analyse and work with the data.
However, situations can also arise where there is too much data. If data is collected arbitrarily from several channels, you will face a big challenge when you need to uncover the hidden treasures in your data due to all the irrelevant information. When huge data quantities are created on a daily basis, it becomes difficult to administrate, analyse and extract values from your data. In other words, data must be prepared before it is used.
With different, out of date and sometimes excessive quantities of data, your company will quickly lose its competitive advantages since it is difficult to achieve optimal effectiveness and to benefit from all this data. And if you are also stuck in the old systems, work processes and data silos, your teams will lack the effective tools necessary to do their job optimally.
The traditional solution to the above challenges is ETL, which copies data sets, harmonises them and uploads them to a database or data warehouse. Often however, the disadvantage with this is that the solution still does not produce on-demand accessibility to data, and you risk a complex data integration project requiring lots of resources in the attempt to build your own connection between outdated databases. This can leave you with an inflexible system without a trace of automation and scalability.
The above challenges are resolved with solid data integration solution, which can put you one step ahead of your competitors, reduce costs of data maintenance and help your company grow. Luckily, there is a tool that can help you with all of this: CluedIn.
How does CluedIn differentiate itself from traditional data integration and modelling?
CluedIn is a Master Data platform that integrates all your data across systems without complex design solutions, and the platform covers all the data administration principles necessary for fully integrated, cleaned and updated data. The actual integration process starts with data modelling, where you determine which data sources you will connect and how the data should be structured – i.e. which categories of master data are most important for your company. This could be customers, partners, contact persons, products, sales areas, etc. CluedIn offers several standardised master data categories, e.g. organisation and user, but you and your company can also determine how data should be modelled.
The difference between CluedIn and traditional data modelling is that you do not create a set data model before importing data into CluedIn. Instead, you decide your master data, sources and possible unique identification keys between your data. Data that, when uploaded to CluedIn, matches according to identification keys or reference keys – consisting of the same or similar meta data – will either be automatically merged or a relation will be established between the data. The actual data model more or less builds itself and is flexible relative to your company’s needs. This kind of data modelling is called Eventual Connectivity and it is the foundation of CluedIn.
Solid data integration is essential to ensure that data is identical across systems. However, data first becomes relevant and valid when it is normalised and adapted to the company’s business needs. As mentioned previously, data is often supplied with different standards and formats, depending on the system, department or user. One of the first steps towards harmonisation of data from all your sources is to establish a naming standard.
Take addresses for example – in one system, the first line could be called “road”, in another “address” and in a third “address line 1”. In order to be able to use data, it is important to define a general naming standard that all users in your organisation are informed of. Then, you will be able to normalise the data further and push it out to “downstream” users such as Power BI, Tableau and HubsPot – regardless of the system from which the data originates.
In CluedIn, you therefore use a standardisation tool called “vocabularies”. We will delve deeper into “vocabularies” in a future blog post in the series, but briefly, you assign a standard name to each of your master data fields. The aforementioned address fields could be called [YourBusiness].Customer.AddressLine1. All data fields that are related to the first line of the address are merged into vocabulary to produce a single display. This not only gives you an easy overview of your master data, but it also solves the challenge of too much data. This way, you maintain focus on the data that is most important to your company, while getting rid of irrelevant data.
In order to get one combined display, you need to normalise data further into one standard, so that it can be processed and used together in a consistent way. There are several options for doing this in CluedIn, both for standardisation that is more static and will probably not need to be changed – such as normalisation of words so that upper case and lower case letters appear correctly – as well as for more dynamic standards that can change over time – also directly by a Data Steward.
For example, this could take place by changing values into a common standard (see the example in the figure below) when systems have different standards for a field. It could also take place by removing irrelevant data – such as all N/A and zero values or non-numerical values in a VAT field – so that you can be more confident that the field contains a correct VAT number.
Figure: Illustrated example – customer flag.
The advantage of using normalisation rules like these is that they give your data more meaning. Data should be re-organised in such a way that the users can correctly use this data for further enquiries and analyses. And by changing to a common standard and values, you get data that can compared, which eliminates the risk of misinterpretation and incorrect use. For example, if you want to analyse how many customers you have, it will be easier if the customer flag only has two values which specify whether it is a customer – “yes” or “no” – instead of having to decide about several different formats.
Data can always be updated and changed further in the CluedIn Clean tool.
Using the CluedIn Clean tool, you can find holes in data, identify potential normalisation rules and bulk-clean data. As already mentioned, the VAT field can be very important to your company, and you therefore need to ensure that the VAT field actually contains a correct VAT number in a specific format. By using this function in CluedIn, a Data Steward can easily perform a cluster analysis on the VAT field and identify VAT numbers that do not match the correct standard. For example, they may contain incorrect characters, no country codes or spaces. Some of the results can be potential candidates for normalisation rules, while others can be cleaned and updated directly in the Clean tool.
The result of these integrations and normalisation steps is data that is harmonised in a single display and ready to be used by downstream users, which is the key for becoming a data-driven organisation.