Data anonymization and GDPR compliance: the case of Taxa 4×35
The Danish taxi service Taxa 4×35 faces a 1.2 million kroner fine (roughly €160,000) for not deleting or anonymizing its users’ data. Studying this example sheds light on how data protection agencies are enforcing GDPR requirements for data anonymization.
Taxa 4×35 is a Danish service that allows its users to hail cabs in Copenhagen with an app, similar to Uber. When a user hails a taxi, the Taxa system collects an assortment of data, including the customer’s name, telephone number, the date of the trip, the start and end time of the trip, the number of kilometers driven, the payment, the GPS coordinates of the beginning and end of the trip, as well as written address and other coordinates. Taxa 4×35 then links this data to the user’s tax information to ensure that the proper amount of taxes are collected.
In October of 2018, the Danish data protection agency, Datatilsynet, found that Taxa had kept the data from nearly 9 million taxi rides for five years, well after they were still needed. This hoarding of records goes against Article 5 of the EU’s General Data Protection Regulation, which states that personal data shall be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed,” and “kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed.”
Taxa 4×35’s management thought they were exempt from these two sections of Article 5, which represent the principles of data minimization and storage limitation, because they were anonymizing the data by deleting the names associated with the trip records from their database after two years. (The remaining data was then deleted after five years.) Datatilsynet found this attempt at data anonymization to be inadequate, pointing out that even without the user’s name, Taxa 4×35 still had enough personal information to identify an individual. The agency concluded that “Information about the customer’s taxature (including collection and delivery addresses) can therefore still be attributed to a natural person via the telephone number, which is only deleted after five years.”
You can read the full Datatilsynet statement on Taxa 4×35 here. (In Danish)
GDPR requirements for data anonymization
The GDPR makes critical differences between personal data, pseudonymized data, and anonymized data. Taxa 4×35’s reasoning that anonymized data can be used much longer than personal data was correct. According to Recital 26, “The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”
However, Taxa 4×35 failed to meet the high standard that the GDPR sets for data anonymization. Earlier in Recital 26, it states that not only must an organization consider whether it can identify an individual using the data it has within its database, but it must also consider:
all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.
Since it is relatively easy to look up a phone number and match it to an individual, the Taxa dataset is not anonymous. Because the records are not anonymous, they are still subject to the full protections listed in the GDPR, which means that Taxa 4×35 should have deleted the data after two years and had documentation to prove it.
True data anonymization
Effective data anonymization is made up of two parts:
- It is irreversible.
- It is done in such a way that it is impossible (or extremely impractical) to identify the data subject.
In WP 216, the Article 29 Working Party examined several different methods of data anonymization and clarified what measures data processors and controllers have to take. They specifically say that “removing directly identifying elements in itself is not enough to ensure that identification of the data subject is no longer possible. It will often be necessary to take additional measures to prevent identification, once again depending on the context and purposes of the processing for which the anonymised data are intended.”
In the Taxa 4×35 example, their justification for maintaining their database for five years was business development. In this case, they could have made accurate models of when and where they needed drivers and anonymized their data by deleting all other data besides the the date of the trip, the start and end time of the trip, the number of kilometers driven, and the GPS coordinates of the beginning and end of the trip. Then, they could have grouped this data by day or location rather than by account. This would have allowed Taxa to identify geographic hot spots and rush hours for its drivers, but would not allow it to identify individual data subjects.
The GDPR aims to give individuals control over their personal data, not to prevent companies and organizations from reaping the benefits that analyzing big data can offer. By fully understanding the GDPR requirements regarding the anonymization of data, organizations can continue to process data and reduce their exposure to GDPR fines. Taxa 4×35 made a half-hearted attempt to anonymize its data, and it was caught.
The GDPR has many requirements for how personal data should be handled. It can be daunting, but we made this website to help businesses with the basics of GDPR compliance. See our GDPR checklist and overview of the law to get started.