These challenges can be defined as concerning data collection, curation, quality, conformance, lifecycle and governance. This is just part of the complex world of data management. It’s a world in which it is understood that it costs ten times as much to complete a unit of work when data is flawed as when perfect data is used. Successful data management demands an accurate understanding of the inherent value of the information.
1) Data quality
What is perfect data? The most valuable data should drive future economic benefits to your company, and the organization must have the right to use that data while remaining compliant with data protection and sovereignty law. While there is no perfect answer to that question, as the response will reflect your unique business needs, it is apparent that some data is better than others. Some information remains better for longer than others. In many cases, the data your business needs may be gathered and responded to in near real-time but have little value after that. There’s even a network effect as each smart system goes online and a new data stream appears. How do you define useful data from within all this? What about inaccurate data? How many sources of data must your business use to ensure data quality?
One Gartner estimate claims poor data quality can result in an additional spend of $15m in average annual costs. “You can have all of the fancy tools, but if your data quality is not good, you’re nowhere,” said Veda Bawo, Director of Data Governance, Raymond James. One widely used concept is “data quality dimensions,” the percentage of any given data stack that is faulty. On average, 47% of recently created data records have at least one critical error.
2) Data management
It’s a set of considerations that includes technology implementation, data acquisition, governance, ownership and more. If data collection is gathering data from various sources (such as customer data, survey information or key data purchased externally), then data management is the art of categorizing all that information and accurately assessing its value and longevity. Data lifecycle management is another dimension to this. Encompassing data creation, storage, usage, archive and deletion, it looks to the usable life of the information you collect.
3) Data insights
With so much data generated in real time, we also see increased use of artificial intelligence to handle some data management tasks. McKinsey claims AI adoption is accelerating, reaching 56% of businesses in 2021. Some tools automate data handling and management, exploiting transient data in real time while accurately tagging and storing the data likely to be of longer-term value.
Generative artificial intelligence technologies may help optimize the information you do collect, particularly around solving problems related to sampling bias or data scarcity. Gartner says by 2025, over 30% of new drugs will be discovered using generative AI. In Switzerland, the SAFER project explores how representative databases using synthetic faces may help train “ethical facial recognition” tools, hopefully helping to reduce instances in which sampling bias delivers inaccurate results.
4) Data ownership
There’s also the challenge of data ownership. GDPR and similar legislation emerging worldwide define data ownership and mandate permissions that must be given before some data, particularly personal data, can be exploited. Regulators are attempting to define a framework that respects the rights to gather data while also building key protections around the use of that information. Fundamentally, possession of data is not always the same as ownership. It is possible that some of the information a company collects cannot be used critically. Data conformance challenges mean companies must ensure any data they do choose to use is owned – or at least the rights to use it agreed – by them. McKinsey argues that 60% of businesses who attribute at least 20% of their business revenues to the use of AI also show well-defined data governance processes and have protocols in place for good data policy.
On a global basis, the regulatory environment is not yet fully developed. While those geographies with data rights defined may arguably be at a disadvantage when it comes to using this information, those nations outside the scope of data protection are most at risk. This is because businesses in such economies need clearer regulation to help mitigate future risk as local regulation is put in place and to empower the use of data in locations where such use is already regulated. Business & Decision says enterprises working within regulations such as GDPR are also more likely to convince customers to trust them to store and use their data.
5) Data governance
Different from ownership, data governance is about defining accountability. That means deciding the person(s) within a company to have authority and control over data, how that information can be used and how it is protected. Gartner calls data governance “an integrative discipline for structuring, describing and governing information assets across organizational and technical boundaries to improve efficiency, promote transparency and enable business insight.”
Only when all such questions are resolved will it become easier to yield the business benefits data should provide. The answer to the age-old AI adage of “garbage in, garbage out” is to take a highly systemic and deeply granulated approach to all aspects of data management across the entire data lifecycle. Ultimately, if a company does not understand what data it has and how it is governed, then it cannot manage that data effectively. Poorly managed data tends to deteriorate to the detriment of the business.
Jon Evans is a highly experienced technology journalist and editor. He has been writing for a living since 1994. These days you might read his daily regular Computerworld AppleHolic and opinion columns. Jon is also technology editor for men's interest magazine, Calibre Quarterly, and news editor for MacFormat magazine, which is the biggest UK Mac title. He's really interested in the impact of technology on the creative spark at the heart of the human experience. In 2010 he won an American Society of Business Publication Editors (Azbee) Award for his work at Computerworld.