The ICO guidance on big data, AI and machine learning - what humans can learn

Guidance on the interpretation on the provisions of the GDPR naturally shoots to the top of the privacy professional’s reading lists at the moment.  As such, you could be forgiven for not absorbing the ICO’s 100+ page guide to big data, artificial intelligence (AI), machine learning and data protection, revised March 2017.

This is a really interesting read, so here’s an all-too-brief précis to show you why you’re missing out on an important piece of thought leadership.

It’s thorough

The guide covers all the big issues; fairness, conditions for processing, accuracy, rights of individuals and security.  Whilst anonymisation is encouraged from the outset, the guide assumes that data remains ‘personal data’, for the purposes of data protection law.

Helpfully, the guide categorises the types of data used in big data analytics into four broad categories.  This helps to classify a vast spectrum of types of data, which was itself an argument.  These are:

  • Provided data - e.g. data from an online form.
  • Observed data - e.g. website session data (cookies) or CCTV.
  • Derived data - e.g. customer loyalty from purchase histories.
  • Inferred data - e.g. correlative patterns such as credit scores.

This is just one example of where strides are made to sensibly narrow and refine the issues faced.

Chapter 3 is packed full of compliance tools which can help organisations make privacy a painless companion to big data analytics, not a hindrance.  The notes on Privacy by Design and Privacy Impact Assessments fully appreciate forthcoming GDPR requirements.

The guide concludes by drawing together the emergent themes such as the continued importance of transparency in the face of a perceived emphasis on accountability.  The themes help us get inside the mind of the regulator when it comes to big data analytics, what it considers to be most important.

It’s well researched

The guide gives practical real-world advice on what’s being done to progress data protection in big data analytics.  The ICO has gathered data from over 250 other articles and sources, distilling these articles and research pieces into digestible chunks of useful information.  These are memorable, and often very interesting.

The prevalence of the anecdotal evidence throughout the guide will help organisations engage in the issues raised and inspire the development of new advances on the topic.

It’s an oracle

The guide acts as a fortune teller for what may be on the horizon, both from a technological point of view and in relation to developments in data protection regulation and best practice.

Trust and understanding appear regularly in the commentary around what ‘good’ looks like.  The guide argues that transparency is not trumped by accountability as some have predicted.  There’s importance in both.  Organisations should focus on providing innovative ways to educate data subjects on how big data-style processing of their personal data will benefit them or is fair, such as graphics, videos or storyboards.  This leads to a trust that sees users engage with processing more, in turn providing a home-grown ‘debugging’ framework. 

The ICO argues that gaining GDPR-grade consent is not impossible in big data analytics, as some have foretold.  The guide lists examples of where organisations have used novel and innovative approaches to help gain, manage and withdraw consent.  Graduated consent is listed as one possible solution to the common issue in big data analytics of experimenting on, and thereby repurposing, data.

The guide addresses the perceived wiggle-room in the GDPR’s Article 22 data subject right.  This is the right to not be subject to a decision which significantly impacts them taken solely by automated means (Article 22).  The guide mentions several areas where this right will indeed be engaged fully.  These are arguably the areas with the greatest potential for harm:  Credit applications, recruitment and insurance.  Notably, judicial decisions are left out, but this issue is more prevalent in the US currently. 

The key message is that if the organisation exploiting big data analytics cannot explain the decisions which arise, then how can it expect to explain this adequately to data subjects?

Algorithmic transparency is discussed in detail.  The guide suggests several approaches to safeguard the fairness and transparency of decisions resulting from deep learning (the process of machines creating their own, inhuman, ways of taking decisions).  These include auditing techniques, which are most effective when the ability to audit is ‘baked in’ to algorithms in the design phase.  The ability to translate decisions taken by machines into ‘storyboard’ explanations is promoted as a key differentiator for organisations relying on automated decision making.


This guide is most helpful in narrowing the definitions of the issues faced.  Such a practice can help more focused, solution based commentary emerge.  This will be a welcome change from the rather gloomy reporting on big data analytics and privacy that we currently see.  Recognising the benefit to society as a whole, the ICO is welcoming big data analytics, not discouraging it.

Using real-world examples, the guide suggests several practical approaches to explore in relation to lawful and fair use of data, as well as for managing the lifecycle of the data including repurposing. 

However, some issues remain open for further academic discussion, such as the much maligned discriminatory bias regularly emerging out of machine learning in, for example, the judicial or consumer credit spheres.  Here the guide talks about the use of ‘Ethics Boards’ in big organisations as a palatable solution.  For all sizes of organisations, a code of ethics can help build an overarching trust when seeking to exploit new uses of big data analytics.

Importantly, this guide is a timely reminder that technological change must track organisational change when it comes to data protection transformation.  PwC has recently published a white paper on technology’s role in data protection.  It is available here.