Event Sourcing & The GDPR - Erasing your data footprint

Recently, the EU General Data Protection Regulation (GDPR) came into effect. You’ve probably heard all about it (or at least seen the absurd amount of ‘update privacy policy’ emails in your inbox). In any case, the GDPR attempts to regulate data protection for EU citizens, and is applicable to any organization that deals with EU citizens.

The GDPR has many implications for any software or organization that processes data. However, if you are considering implementing event sourcing in your application (or have already done so), there are a few provisions in the regulation that have specific implications for event sourced applications.

(this article was published earlier at the JAX London Blog)

Consent

One of the requirements of the GDPR is that an organization should be able to prove it has consent to process someone’s (personal) data. The consent must be very specific, and it should be able to withdrawn at any time.

For example, if you use the same personal data to send a newsletter, perform data analysis and do retargeting, you must have consent for those actions individually and support individual withdrawals of consent.

Demonstrating consent was given is easy when that consent was recorded as an event.

Right to Erasure

Without a doubt, the most interesting article in the regulation (at least when it concerns event sourced applications) is Article 17, “Right to Erasure”.

“… the data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay…”

Whenever an Article 17 request is received, we first have to identify all the events that contain personally identifiable information of/for the requestor. and either (sufficiently) anonymize those events, or remove them altogether.

Challenge: immutability

Individual events are generally considered immutable. After all, events are a reflection of history, records of something that happened. Multiple events form event streams, persisted in event stores that are append-only. In fact, some implementations are even backed by immutable storage (such as Kafka, or a WORM drive).

Append-only event stores with immutable events have their own special advantages. Events can be cached ad infinitum and form the basis of a stable audit log. Any mistakes, errors or missing information in previously persisted events are typically dealt with by applying corrective events (similar to an accountant’s ledger) or using upcasters.

However, when you need to erase or anonymize personal information, those strategies are no longer an option, as they’ll both leave the original data intact (and you non-compliant!).

One option is to create a copy of the original event stream, filtering out the affected events, or including anonymized versions of those events. When that process is complete, the original stream should of course be discarded.

Don’t store personal information in events

Another way of dealing with this is using a mix of event sourcing and (regular) database tables. The idea is that personal information is no longer stored inside events themselves, but in another database or storage solution.

Whenever an event is read by the system, the associated personal information is then retrieved from the secondary database and merged with the event.

Dealing with an Article 17 request is then reduced to finding the right entry in the secondary database and removing it. Any subsequent reads of the event will leave that event essentially anonymized.

Crypto-trashing/shedding

The last technique I want to discuss keeps the personal information inside events, but encrypts that data using a unique key (that is either associated with the event, or an aggregate). The encryption key is stored in and retrieved from a (centralized) key management system. Events are decrypted automatically before they are handled by domain code.

Whenever an Article 17 request is received, the appropriate key is looked up and promptly removed (forgotten). Thus, rendering the personal information unreadable and effectively removed.

Interested to learn more? Join me for my talk “Forget me, please? Event sourcing and the GDPR” at JAX this fall!

Michiel Rook's blog

Michiel Rook

Leave a Reply