Article first published on the Swiss website Arbido, in the Metadata – Quality data report (in French).
« Information excellence leads your business to success ». This slogan, used by many vendors in the world of computing, effectively hides the need for permanent control over the data, their quality and the management rules that govern them.
Today a simple « good will » is not enough anymore when you have to face the complexity of information systems and the volume of data to manage : the challenges which are imposed by data require a rigorous and precise management. Past experience shows that this goal can only be achieved by the « massive » use of metadata by means of automation.
But first, a bit of explanation about metadata. Metadata are a set of information that describes the stored data. These metadata are to information systems what maps are to traffic routes: a simple description. According to the pursued goals, maps describe either roads, hiking trails, waterways or ground relief, tourist attractions or any combination of these alternatives.
Metadata are to information systems what maps are to traffic routes.
As with maps, the information provided by metadata can be very diverse: data localization, data type (document, image, email, audio or video file), management rules, etc. Regardless of the media and format, metadata offer the possibility to find and process data in a specific way at the right time.
Indissociable of the archiving profession, metadata are the heirs of the coding systems already in place before the computing era. Libraries, the music industry, the publishing world, administrations are large metadata users. Since the arrival of the Internet, the use of metadata has become widespread inside web pages (metatags), online applications (social media, e-commerce, streaming) and their introduction in the digitalization processes allows each user to control their digital documents.
Metadata for business
For organizations, some thirty years of wide computerization have given birth to an extremely varied fleet of information systems and data storage methods. The scattered and opaque data silos nightmare is a reality and you no longer count the companies that still use today archaic systems which do not allow the various departments concerned to take full advantage of their informational heritage, or even jeopardize the operational efficiency of the company.
The current digital flood only increases the imperative need for data control.
As we already said, to meet this challenge is only possible through metadata control which generates a better knowledge of the information systems and increase the efficiency of data utilization.
Some regret that an action plan for setting up data management through metadata is both time-consuming and an unnecessary investment. Experience shows that the result is all benefits: speed to find and process data, elimination of redundancies, discovery of hidden links between data… which permits to valorize the available « big data » while using the know-how of the concerned employees inside the company.
Data quality and « golden record »
One of the most immediate and best known use of metadata touches data « quality ». Indeed, the knowledge of the data location makes it possible to extract and compare for the same data the different versions stored in the multiple databases of the organization. These comparisons facilitate the creation of what is known as the « golden record » or « unique version of the truth » then allowing you to compare and align the different versions. To illustrate this by a concrete example, let's take the case of Mrs. X. In her customer account, it appears that she has 2 children, but in another system, she has 3 children. This difference is easily detected through the use of metadata and at least permits to standardize to 2 or 3 the number of children of Mrs. X. But is it really the correct number? Only further information, probably from Mrs. X herself, will ensure that the data is in line with reality: she actually has 4 children.
Metadata are strategic elements for the implementation of a real data « governance ».
This example shows a major point in the concept of data « quality »: namely the « distance » between the reality described by data and the reality of the world. From a theoretical point of view, data describe the reality of the world with a delay time and give a partial view of it. Information systems are therefore inherently late and incomplete in comparison with the reality they are supposed to describe, reality which is in perpetual evolution. This distance between the description given by data and the reality presents a potential « risk » for the organization whose decisions are based on the described reality. In this context and in terms of management, it is essential that the data « quality » management is based on the role and importance of the data in relation to business objectives.
A career for the future
As we've seen, metadata are strategic elements for the establishment of a real data « governance ». They facilitate the integration, sharing and daily management of vital data for the company, which is the basis for the development of « business » strategies. No operable data without metadata.
Even if the principle is not new, few organizations have decided to prioritize metadata in the past (due to lack of knowledge, fear of costs or fear of implementation duration). However, it is clear that in recent years the phenomenon has reversed and « data engineers » have seen their role grow significantly to become the engines of agility and business performance in a context of general competition.