Blockchain & GDPR: from practice to theory and back

Blog

Blockchain & GDPR: from practice to theory and back

We are (almost) certain that blockchain technology will bring world peace, solve climate change and make democracy great again. But sign of the times is that at one point in any discussion someone will bring up the dreaded GDPR. What about it in relation to blockchain technology?

1. INTRODUCTION

Let’s start from where we left off in our previous blog. We introduced to you “the blockchain” and “the GDPR” and explained how the two get mingled. We concluded that considering the public key, it is safe to say that if the user of a public key is a natural person, that key generally is considered personal data. Regarding transactional data however, the answer is less straightforward. It revolves around the question whether a hash should be considered personal data.

In this second blog we will guide you through the grey area and elaborate on where tech-savvies and lawyers collide.

2. MEET THE HASH

Depending on the type and purpose of the blockchain, transactional data is the information or value that is intended to be shared across the network. The public nature of a blockchain created the need to obfuscate data and verify the original data, which is done by hashing. Hashing can be used to give integrity to the data shared on a blockchain to create a single source of truth.

A hashing algorithm can be used to reduce any amount and type of data into data of a (small) fixed size. The outcome, the so-called hash value of the data, can be thought of as a unique digital fingerprint of the original data. Even a minor change, so much as removing a dot in the original text, would render this digital fingerprint completely different.

Another important aspect of a secure hash function is that it is (practically) impossible to re-create the original data from only the digital fingerprint. The mathematical function hashing – in contrast to encryption, which is reversable – only works one way. So - in the one hand you have a hash value. In the other the hash function. There is but one way to find out what original data input resulted in that exact hash value. Push any combination of data through the function again, until you get that same hash. In yet other words: if you know that the original data input consisted out of about 8 letters, yes, it is still a daunting task but you will get there. But imagine you do not know anything about the shape and form of the original (personal) data that once went through the function – was it a name, a banking account number, the names of your grandparents, and in what order? Trying to get back to that set of data would require quite a bit of processing power. And determination, endurance, commitment and the like.

3. GREY AREA OF ANONYMIZATION FROM A LEGAL POINT OF VIEW…

So, we could argue, that depending on the input, it would be so hard to retrieve the original data input that maybe… we could render the hash value as anonymized data. That would mean: breakthrough, because the GDPR does not apply to anonymized data. Sounds easy right?

However. Once upon a time the Article 29 Working Group (which was converted to the European Data Protection Board (“EDPB”) , as per May 25, 2018) published an Opinion on anonymization. The WP29 then took the stance that the threshold for data to be considered anonymized is very, very high. During the first plenary meeting the EDPB endorsed the GDPR related WP29 Guidelines. It did not endorse the WP29 opinion on anonymization. That does not mean however, that this opinion does not still carry authority and it is still referred to in recent studies such as the EU Blockchain Observatory and Forum’s report on blockchain and the GDPR. The WP29 opinion on anonymization techniques considers data to be anonymous when:

  1. It is still possible to single out an individual – meaning that the individual is still identified by a unique attribute;
  2. It is still possible to link records to an individual – meaning that the individual can still be identified using different datasets of another attribute in the data set to link the original attribute; and
  3. Information can still be inferred concerning an individual – meaning that it is still possible to identify the individual within the dataset or across different databases that use the same pseudonymized attribute, possibly by a brute-force attack. The more advanced algorithm, the more difficult a brute force attack could succeed in reversing the hash. The risk of reversibility also depends on the type of original data. If the original data is of a small, known, size (for example a name, or a SSN) it can be easy to derive the correct value for that particular hash. 

We would like to stress here that anonymization does not equal encryption. Encryption is a way of pseudonymization that should be considered merely a security measure which reduces the likability of a dataset with the identifiable person. Encryption aims to ensure that only a “members only club” can still link the data back to an individual. Anonymization on the other hand, aims at making it impossible to identify an individual by anyone. Considering the WP29 opinion on anonymization hashing is established a pseudonymisation technique but could in some cases be equated to an anonymization technique.

4. AND THEN THE GDPR HAPPENED

Moving from the WP29 opinion on anonymization to the GDPR’s considerations, published on May 4th 2016. According to those, the question whether or not it is likely to identify an individual, “all the means reasonably likely to be used, such as singling out” should be taken into account. This must be based on “all the objective factors, such as the costs of an the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments”. So, there you go: in order to conclude that hashed data is anonymized, a close look should be taken at the available technology, the current costs and the amount of time that need to be invested to make the effort as described above to retrieve the personal data that was put through the data grinder, aka the hash function.

Guided by the WP29 opinion on anonymization and the GDPR’s consideration, the argument can be made that depending on the context of each hash – the blockchain solution itself – can be assessed on a case-by-case basis. This however, is still hotly debated and up until October 2019 very little guidance was published on such an assessment. Recently the Spanish DPA together with the European Data Protection Supervisor (“EDPS”) published an essay on the hash function as a personal pseudonymization technique. It guides through a theoretical story of operational methods of hash techniques and introduces a re-identification risk analysis. Also according to this essay, the hash function can anonymize data depending on its implementation. It concludes however, by stating that the natural consequence of anonymization is that the data controller loses their capacity to validate the hash. And that is absolutely not the goal of the use of hashes in blockchain technology.

5. BUT WHAT ABOUT A MORE PRACTICAL APPROACH?

We keep noticing that this legal – (“in most cases”, as a lawyer would add) extremely theoretical - perception of a hash qualifying as personal data is difficult to understand from a technical perspective. Several blogs have been written about the misconception of hashes being treated as personal data and additional mitigation measures are suggested such as ‘salting’ or ‘peppering’ hashes. Tech-savvies explain why and how it is impossible to use any computer power in the world finding the needle in the haystack. It is no way to evaluate which of all the imaginable inputs (and only having a vague idea of what the type of input data was) was the original data. But all of the above seems to be irrelevant if in some way the original data can still be used to validate the hash value.

In our point of view this cannot be where the discussion ends. The GDPR comes with many compliance requirements for appointed parties and could greatly affect the development of blockchain solutions. It cannot be that via the described route, the GDPR is the showstopper for innovation. It would therefore be beneficial to the blockchain industry that a hash of personal data is not systematically interpreted as personal data, or that, for example, various implementation variables can result in a light weighted applicable GDPR regime because of the robust security blockchain technology can offer. Also regulators and supervisors should keep exploring further for the right balance: they should not move too quickly, not too slowly, and not provide us with too restricted nor rules too liberal. And we are keeping our fingers crossed for those who despite uncertainty and grey areas will keep their curiosity, balance to the less safe side and even flirt a bit with “danger”: adopt a privacy-by-design approach; work creatively to by-pass some of the restraints and emphasize mutual goals between the GDPR and blockchain technology to distract attention from the less compatible characteristics.

6. MORE INFORMATION?

Do you want to know more on Blockchain and the GDPR in practice? Please contact Charlotte (+31 (0) 6 20123968), Marloes (+31 (0) 6 20057902) or Diderik (+31 (0) 6 83639361).

Did you find this useful?