Forget me please? Event sourcing and the GDPR

In May 2018, a new piece of EU legislation called the General Data Protection Regulation (GDPR) will come into effect. The GDPR attempts to regulate data protection for individuals within the EU and has very interesting and specific implications for applications that use event sourcing. In this article, I’ll discuss my thoughts on this subject and a few pointers for those implications.

The right to erasure

The GDPR of course has many implications for any software or organization that processes (consumer) data. For event sourced applications however, the most interesting and difficult part of the regulation is without a doubt Article 17, which concerns the ‘right to erasure’:

“… the data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay…”

Imagine an existing application, built on the principles of CQRS and Event Sourcing (a common combination of patterns). Amongst other data, this application stores (some) customer information and as such is covered by the GDPR.

Let’s assume one of the customers submit an Article 17 request. To grant that request we first have to identify all the events that contain personally identifiable information of/for the requestor. and either (sufficiently) anonymize those events, or remove them altogether.

Immutability

Event sourcing theory states that each individual event in an event stream or store is immutable and the event store as a whole is append-only. After all, events are a record of history, records of something that happened. In fact, some ES implementations are backed by immutable storage (such as Kafka).

Immutable event stores have their advantages (such as infinite cache TTLs and a stable audit log), but also a few cons. Any mistakes, errors or missing information in events are typically dealt with using using corrective events or upcasters. When erasing or anonymizing personal information, those strategies are no longer an option, as they’ll leave the original data intact.

Thus, compliance can only be achieved by directly deleting or modifying events. Alternatively, we can create a copy of the event stream leaving out the affected events, or including anonymized versions of those events. When that process is complete,  the previous stream should of course be discarded.

When removing events, care must be taken that you do not produce any inconsistent aggregates.

Encryption

If immutability is a very strict requirement, or a limitation that can’t be overcome, there are other potential solutions. One such solution would be to encrypt the events for a particular person with a unique key, with a sufficiently strong, modern encryption algorithm.

When a person submits an Article 17 request, you simply have to “forget” the appropriate key and all associated data will become unreadable. Whether that remains the case in the future largely depends on the chosen encryption algorithm. Managing the individual encryption keys can also become somewhat of a pain.

This approach has also been discussed in some detail on the Gemalto blog. It seems that the AxonIQ folks (the company that has been launched to further development of the Axon framework) are working on something along these lines as well.

Projections

None of the above methods ensure consistent projections, or automatically trigger updates to them. Any projections that contain (parts of) personal data should be (partially) rebuilt if the data has been encrypted, anonymized or removed from the event store.

Final thoughts

The GDPR is an interesting bit of legislation, and it’s going to come into effect before we know it. How the regulation will integrate and play with other (local) data or privacy protection laws remains to be seen. For example, business entities in The Netherlands have to retain their administration for seven consecutive years, which may not be completely compatible with the GDPR in all cases. Thus I anticipate seeing a number of court cases to figure that out.

For event sourced applications, the implications of Article 17 specifically can pose a challenge. I’d encourage everyone that has an event sourced app in production (or is in the process of moving there) to seriously think about if and how the GDPR is going to impact their application. Apart from the challenges however, event sourcing can also assist with GDPR-compliance. Rich domain events form an audit trail that can demonstrate consent and form a legal basis for processing (personal data). Food for thought!

Update 28/11: I’ve published a follow-up post to this one.

Michiel Rook

Michiel Rook is an experienced, passionate & pragmatic freelance coach, developer & speaker from the Netherlands. He loves helping teams and companies to develop better software and significantly improve their delivery process. When he’s not thinking about continuous deployment, DevOps or event sourcing he enjoys music, cars, sports and movies.

7 thoughts on “Forget me please? Event sourcing and the GDPR

Leave a Reply

Your email address will not be published. Required fields are marked *