Photo by David Parkins in Economist
Data is the new oil. Nothing new in that. Data is everywhere. Decisions based on data always give you an edge, again nothing new in that. But you might still be making completely wrong decisions based on your data. Is this caused by a problem with your interpretation or data analysis? No, there is no problem with your interpretation and analysis of data, the problem is with your data quality!!
Data is the New Oil: Navigating Data Quality
Does your data stink?
Let’s do an exercise. Take any data point in your system. It could be your installed base. Maintaining and managing installed base details and customer details around it is the bread and butter of any industrial organization. In how many systems, do you find the equipment details duplicated? Do all your data points match with each other? Who owns the master record? Are all systems actively cleaned up to unify the data points? Again, this is something that we come across over and over again; there’s nothing new in all these problems.
Bad data can severely impact the financial health of an organization
The world understands it, but how do we solve this problem? Needless to say, bad data not only leads to wrong decisions but can severely impact the financial health of an organization. Organizations do go for one-off data cleaning projects which help them clean the data at a given point in time. However, the data tends to go bad quickly if the cleanliness is not institutionalized. Imagine cleaning our house once a year! Will it suffice?
Data cleaning is so difficult!!
Is regular cleaning of data even possible? I would say it is not even a question. Is there a choice? If organizations do not actively pursue the strategy of maintaining the quality of data, they will quickly lose track of many things. The analytics surfaced by data points will be way off the mark. It can be completely wrong.
On the face of it, it looks like a daunting exercise. Organizations have so many systems and data constantly flowing in and out of them. Different functions do their updates which are never propagated across the organization. We have truly evolved from multiple spreadsheets in individual laptops to multiple data stores in all possible corners of an organization. And spreadsheets still have not gone away. On the contrary, they are going strong and still growing.
Is there a way out?
Building a culture of data quality as an integral part of the organizational Culture – Is the way out
Yes, but it is not so much about technology but building a culture of data quality as an important value as part of the culture of the organization. Once the aspiration is there, technology will fall in place, as data is the new oil, it should be utilized in the most appropriate manner.
The most important thing for any data quality initiative is to build a notion of a “single source of truth”. That would mean data from different systems constantly fed into the system designated as a single source of truth. The batches of data will also need cleaning, deduplication, and enrichment exercises to be performed on it. And the single source of truth should be able to make the data available to all other systems through open APIs and integrations.
I believe that in the longer run, every organization will have to go to the approach of having a “Single source of truth” which is the primary data store and all other workflow systems feeding on it to serve the business workflows and processes.
Do you have a single source of truth that maintains the quality of data as a constant exercise?