The recent Equifax data breach was unusual for its scale -- nearly 150 million accounts were affected -- but the basic story is nothing new. Massive data breaches have been happening at a record pace. Just one security flaw in one piece of software can put an entire enterprise at risk. Everybody is vulnerable.
The European Union has responded to the problem with its upcoming General Data Protection Regulation (GDPR), which will help minimize the risk and impose stiff penalties -- up to 4% of global turnover-- for organizations that don't comply. Firms with EU exposure are scrambling to prepare for its 2018 starting date.
Most companies see GDPR as a burden, but Kenneth Sanford, an analytics expert at Dataiku, sees it as a blueprint for better data practices. While regulation won't stop the breaches, it will help prevent the damage that they cause. With data and analytics becoming so central to how we do business, it's time that we start taking it more seriously. Here's what you can do:
1. Kill Data On The Laptop
Historically, collecting, storing and analyzing data was expensive, so we didn't deal with it in large amounts. It made sense to keep a central database on a server somewhere and then download a portion to a laptop for analysis. That both allowed executives to do analysis on the fly and lessened the need for massive internal resources.
Yet today, storage and computing in the cloud are incredibly cheap, so there's no real need to download data onto an external source. What's more, all that data floating around in so many places becomes a massive security vulnerability. So it's becoming increasingly important to control where data goes.
"You need to centralize your data and your analytics, not only for security but for continuity of business processes," Sanford says. With today's cloud environments, this is a pretty simple fix and won't result in any loss of convenience or mobility. At the same time, it makes your organization far more secure.
2. Act With Purpose
Another source of data vulnerability is that there is so much of it. "Storage got so cheap that everybody started collecting everything and not knowing why," Sanford explains. It's like in the early days of web registrations, when to sign up for a service you had to fill out an impossibly long questionnaire. Today, we're doing much the same, except we collect data in pieces.
The problem is that once you own data, you become responsible for it. If, for example, you are holding the someone's social security number even though you do not have a financial relationship with that person, then you are putting their identity at risk. Under GDPR, you will face serious financial liabilities for mishandling it, but it's just simply bad practice anyway.
Another problem with having too much data is that it can lead to poor models. Often, having too much data leads to as overfitting, which basically means that the more variables you use to create a model, the harder it gets for it to be generally valid. In some cases, excess data can result in data leakage, in which training data gets mixed with testing data.
So don't assume that more data is always better. Make sure that you have a purpose for collecting the data you do.
3. Create An Audit Function
Most organizations develop and enforce best practices for different functions, like finance, marketing, logistics and so on. With data, however, it's still mostly a free-for-all. Different departments store and analyze data as they see fit in a way that wouldn't be tolerated in any other business function.
That's why Sanford stresses the importance of establishing an audit function for data practices. "Today all that data as much a liability as an asset. We need to start treating data compliance more like we do tax compliance," he told me. As the frequency and severity of data breaches continue to increase, this is becoming a no-brainer.
He also notes that the best performing companies tend to have separate analytics teams and best practice units that help to ensure that standards are both implemented and enforced. You can't expect to become a data driven company just by virtue of collecting a lot of information and setting up some Hadoop clusters. You need to build competency.
Data As A Business Practice
One of the provisions of the GDPR that many find particularly onerous is the requirement that every company appoint a Data Protection Officer. Here again, although no one likes having a solution imposed on them, appointing someone in your organization who is directly overseeing data practices is probably a good idea.
Consider first the cost of data breaches, which average over $7 million in the US and over $3 million globally. In a large breach like Equifax, the price-tag can easily be ten times that. So appointing a competent person to help prevent data breaches can make an enormous amount of economic sense.
Additionally, Sanford points out that a focused practice on data and analytics can open up new business opportunities. For example, General Electric, long an industry leader in complex industrial machinery, is unlocking new value in Predix, a cloud service that helps to optimize industrial machines through advanced analytics.
The truth is that data security and operational excellence go hand-in-hand. You have to know what data you have, what you're going to use it for and how to get rid of it when it no longer serves a useful purpose.