The promise of an ever-evolving big data landscape and the depths to which analytics can provide companies with new insights, or even new products, means this is an exciting time to be involved in data research.
At the same time, however, we're charting new territory that crosses multiple boundaries, including ethical and privacy concerns. This is especially true, of course, with data that contains personally identifiable information (PII) as this risks exposing individuals to everything from tracking through to fraud if the data should get into the wrong hands. Even here, the definition of compromised data is not always clear cut – hacking and breaches make the news, but intentional or accidental abuse by employees with access to data is also an issue. So on the one hand, data can be invaluable especially when it comes to gaining big picture insights in sectors including governance, health, retail, and finance – but on the other, this must be balanced with the need to protect individual privacy where PII is involved.
The logical solution = anonymising data
Separating PII from other non-identifiable data by removing, supressing, or encrypting it. But there are also other options: irreversible aggregation, k-anonymity and importantly, the emerging field of differential privacy.
By its nature, manipulating data to obscure personally identifiable information comes at the cost of accuracy. However, done right, it's possible to make it very hard or (and I use this phrase lightly) nigh impossible to reconstruct personal information while returning valuable information to produce insights. It's important to remember, however, that just because a dataset anonymises PII that it doesn't necessarily mean an individual’s privacy is protected — two separate anonymised datasets can be combined to glean correlating information and make it possible to still reveal PII. And considering the regularity at which data breaches happen, this also must be considered with respect to data that's already out there (e.g., breached census data paired with already available credit card details to build a personal profile for identity theft).
Concerns around the sharing of data
Inevitably, organisations and institutions can leverage greater use of data by sharing. However even if data is anonymised, responsibility for that data cannot be transferred. If an abuse happens in the hands of another organisation, this shouldn't absolve responsibility for the organisation owning the data.
As a result, as we move to a future where we are leveraging big data more than ever before, it’s vital that all stakeholders who interact with that data have critical investments in cybersecurity. This isn't just good business sense in terms of protecting assets, but also vital to build trust with customers and the public. Otherwise, the value of that data may be compromised if individuals don't trust an organisation to keep their details safe, and so either abstain from providing data, or provide incorrect data.
As we move to a future where we are leveraging big data more than ever before, it’s vital that all stakeholders who interact with that data have critical investments in cybersecurity.
This is only scratching the surface of the considerations involved with the collating, keeping, and management of data, but the potential it provides will see big data and analytics continue to grow rapidly as a sector – it just needs to be remembered that with great value comes great responsibility.
By Ashton Mills:
Ashton Mills is the Outreach Manager – Technology & Innovation at Australian Computer Society – www.acs.org.au