Event sourcing and the GDPR: a follow-up

My article about the implications of the GDPR for event-sourced applications that I published last week generated a sizable number of responses, suggestions and comments (most of them on Twitter). All of which are appreciated of course! In this post I’ll list the most interesting comments and try to respond to them.

1. Removing data from projections might be good enough

There were a few tweets that suggested it would not be a violation of Article 17 (of the GDPR) to only remove customer information from (downstream) projections / read models, and leave the event store intact.

I guess the validity of this technique all depends on how you interpret Article 17, and specifically Article 17 (3) lit d:

for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing;

In this case a request to erase would mean applying an event named CustomerInvokedArticle17 (or something to that effect). The appropriate projections should be able to handle that event and remove personal information as required. The upside is that these projections don’t have to be (manually) rebuilt.

2. Don’t store PII in events, but somewhere else

One of the more interesting responses came in the form of a Twitter thread arguing that the combination of the GDPR and event sourcing is a non-issue. That Article 17 requests can be easily solved by storing Personally Identifiable Information (PII) not in the events themselves, but in another database or storage solution (possibly encrypted, see below).

Whenever an event is read, the associated PII is then retrieved and joined to the event by the system, before handing the event to a listener.

While I don’t agree with the assertion of this being a non-issue, on it’s own, this idea seems to have some merit. I do wonder about the practical implementations though. Joining PII to events as they are loaded could have performance implications. Plus, depending on the domain, individual events may contain significant amounts of PII. I’ve worked on projects where moving PII out of the events would have resulted in very sparse, even skeletal, events.

3. Event stream per customer

Some event sourcing frameworks (such as Prooph) support creating an event stream per aggregate (root). When an Article 17 request is received, getting rid of the events associated with the requestor is as simple as identifying the involved aggregate identifiers and removing the associated streams. Conceptually, this is no different from just removing all the events for those aggregates though.

4. Using the event store as evidence

Not directly linked to Article 17 specifically, or the GDPR in general, but an interesting suggestion nonetheless. A short search returned no (immediate) results, but it got me thinking.

In general, an event store can be used as (the basis of) an audit log, but mostly for internal bookkeeping. I’m no legal expert, but for it to be usable in court, I suspect we’d ways of proving that events were not tampered with. This probably means enforcing immutability in some way (using a WORM drive, perhaps?), signing each event or something similar.

5. Crypto trashing / crypto shedding / crypto locking

Three names for the same technique: encrypting PII using a unique key and forgetting that key whenever an Article 17 request is received, thereby rendering the PII unreadable (and effectively removed). This approach is a useful alternative if immutability is a concern (or an architectural choice, such as when using Kafka or a similar system).

It looks like AxonIQ is starting to offer a module around this very concept. I haven’t been able to play around with it yet, though.

An important aspect of this idea is key management: storing, finding and retrieving the right encryption key associated with a particular piece of information. This becomes especially interesting at scale, with a large number of consumer-like aggregates.

Your suggestion here?

I’m sure there are some additional ideas and solid arguments. The combination of the GDPR and event sourcing makes for an interesting problem space. Let me know!

10 thoughts on “Event sourcing and the GDPR: a follow-up”

Pingback: Forget me please? Event sourcing and the GDPR - Michiel Rook's blog
Lenny G
December 29, 2017 at 5:23 pm

The article is great and offering real-life techniques dealing the problem. But the statement “thereby rendering the PII unreadable (and effectively removed)” is very controversial. Each key must expire in a specific period of time (1-3 years) based on encryption type and key length. The reason we must adhere to such policy is the fact that keys become vulnerable over period of time. Therefore, ‘forgetting’ the key effectively removes our ability to properly read customer data, but leaves data even more vulnerable overtime preventing us from rotating the keys. The main goal of GDPR is provide mechanisms preventing data from being potentially exposed to hackers attacks. Forgetting the key, but leaving data in data stores without proper key management increases exposure instead.

IMHO, the last option is not the best approach to solve the problem.
Michiel RookPost author
January 2, 2018 at 9:37 am

Hi Lenny,

Interesting arguments there. I must admit, I’m no encryption expert. I’m sure the 1-3 year period has some validity (though I haven’t heard about it before). I wasn’t in any way suggesting this is the best approach, it’s just one of many.
Lenny G
January 3, 2018 at 6:45 pm

First of all, thank you again for a great article. I think the most important part – it provides real-life options for a problem that many companies suddenly have to deal with, and often without great recipes. I agree with you 100%, having one imperfect option isn’t a big deal.

As for the security-related, this is a great source for this info – http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-57p1r3.pdf, page 56 has “Suggested cryptoperiods for key types”. It is published by NIST.
Michiel RookPost author
January 4, 2018 at 10:44 am

Thanks for the kind words, and your useful comments & information!
Pingback: Kafka, GDPR and Match Sourcing | A1A
Paweł Pacana
October 17, 2018 at 10:52 am

> Therefore, ‘forgetting’ the key effectively removes our ability to properly read customer data, but leaves data even more vulnerable overtime preventing us from rotating the keys.

If we already “forgot” the data and we’re not interested in reading it ever again — rotating keys every cryptoperiod (that is re-encrypting data of lost key with new key) would prevent from this vulnerability, right?
Pingback: Forget me, please? Event sourcing and the GDPR (JAX London 2018) - Michiel Rook's blog
David
November 20, 2018 at 1:50 am

It occurred to me that you could go ahead and actually rewrite the previous history, removing any trace of personal information, by defaulting all the fields in all the related events to a value that represents article 17. Then you could publish an additional event with an “undelete key” as a value and present that value one time to the user. Tell the user to save it and if they ever wish to return to the site they can present the key and all required information, and a re-register event can be triggered. This would allow the user to reclaim any activity on the site if they decide to return.
Sia
November 29, 2018 at 7:05 pm

Hi Michel,

I am working on a similar solution for this issue and I’m going to present my solution to this at a meetup next week.
I’d like to invite you and everyone who can make it to this event.

I’d love to hear other opions to my solution. So if you can make it to Hamburg, Germany I’d appreciate it.

Here is the link to the meetup:
http://meetu.ps/e/G3hzh/tm7BK/d

Cheers,
Sia