Legacy data, powering AI for the future

May 22, 2019 Peter Gee , Big Data

The average corporation manages 348 terabytes of data, according to IDG. Today, the world generates as much data in 10 minutes as it did in a whole year at the turn of the Millennium.

The amount of data available to us — as individuals, companies and societies — is exploding. And the truth is, we’re still trying to work out what to do with a lot of it. And that’s a problem, because what we don’t see any value in, we don’t take care of.

Data that may not seem valuable today is often ignored or, at worst, totally discarded. I don’t necessarily mean customer data or data personal to human beings in some other way. Sometimes that, too — but there are very clear rules regarding how long you can retain that kind of data.

Think instead of the meta-data and depersonalized data created on the factory floor, by a fleet of networked vehicles or in an e-commerce operation. We can mine some of this data for value today. But what do we do with it then? And what about the bits that aren’t valuable right now?

The tools for tomorrow’s world

The things we can do with data today seem almost like magic compared to what we could do only a few years ago. Data scientists can build a scaled look-alike audience from just a small amount of seed data.

Predictive-data specialists can save companies huge sums of money by moving them from analytic techniques, such as the Weibull distribution, that deal in averages across large populations to maintenance regimes optimized for each and every Internet-connected component.

But what about the things we’re not doing today? Computing power and data analytics are advancing so fast, it’s very likely that, in just a few years, we’ll be able to extract value from data in ways we can’t even imagine today. But will we have the historic data we need to reap those as-yet unimagined benefits?

To take an example we can already foresee, a lot of the AI applications being used in enterprises at the moment are based on machine learning. A data scientist takes training data, for which he or she already has both the inputs and desired outputs, and feeds it into the AI, which is constantly tweaked and optimized until its outputs match the outputs supplied with test data. The AI is then ready to let loose on real data, which it will process in exactly the same way it has been trained to.

But there is another type of AI: deep learning. It can spot entirely new patterns in data, ones that can often deliver surprising insights and unlock a lot of value. But typically, to achieve these things, it needs a lot of data — often many years’ worth — to work with.

A company that discarded the data that was no good to its machine-learning algorithms today might find itself unable to benefit from deep learning in the future. Not only would that be a bad idea in itself, it would also send a signal to the (much sought-after) top data talent that the most exciting careers lay elsewhere.

How to maximize the long-term value of your data

So how can you ensure that you extract the maximum value from your data, now and far in the future? First, collect as much data as you can (though, in the case of personal data, be sure your policies are GDPR compliant).

The future is full of unknowns, making it unclear what data from today will be valuable tomorrow. Don’t allow your company to be paralyzed because it tries to develop the perfect technical solution to extract maximum value from all the data you collect. Conversely, don’t limit what you collect to just the data you can extract value from today. Use the best data analytics available to you now, but also collect and store whatever you can, legally and ethically, for the future.

Next, make sure that your data is stored in a portable form. It should be available clean and in its original format, in a read-only repository to which all business functions have access. Storing the data locally, in an owned data center, is often inefficient if you require access to the data immediately or in the near term. In the modern connected economy, you may want to move some of that data on short notice to other markets to take advantage of new opportunities. This works much better if your data is stored in a high-availability cloud location, ideally replicated on secure servers around the world.

If, however, you wish to lock your data away for longer periods, your best bet may be to store it privately. That’s because of the cost involved with classifying your data to make sure it’s protected in the cloud, which could also be increased with egress charges from the public cloud provider if you wish to move the data again on short notice.

Most importantly, to design a data-collection and retention strategy that complies with law and best practice, enables high availability and portability and sets your company up for present and future success, it pays to work with external experts who can help you achieve these things to the highest possible standards in the shortest possible time. Data talent is tight, and companies that move first and do it right will have a competitive advantage.

Value the data you have available to you, and take action now to ensure it works well for you in the future.

Peter Gee

Peter Gee has 30 years’ experience helping business achieve sustainable growth and deliver tangible value. His expertise includes leveraging innovative but proven IT technologies to enable companies going through rapid change to adapt and thrive. A solid track record of developing and delivering business initiatives that drive growth, particularly at enterprise and mid-market level. Extensive knowledge of leading transformational processes in fast moving and challenger businesses.