My article about the implications of the GDPR for event-sourced applications that I published last week generated a sizable number of responses, suggestions and comments (most of them on Twitter). All of which are appreciated of course! In this post I’ll list the most interesting comments and try to respond to them.
1. Removing data from projections might be good enough
There were a few tweets that suggested it would not be a violation of Article 17 (of the GDPR) to only remove customer information from (downstream) projections / read models, and leave the event store intact.
I guess the validity of this technique all depends on how you interpret Article 17, and specifically Article 17 (3) lit d:
for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing;
In this case a request to erase would mean applying an event named CustomerInvokedArticle17 (or something to that effect). The appropriate projections should be able to handle that event and remove personal information as required. The upside is that these projections don’t have to be (manually) rebuilt.
2. Don’t store PII in events, but somewhere else
One of the more interesting responses came in the form of a Twitter thread arguing that the combination of the GDPR and event sourcing is a non-issue. That Article 17 requests can be easily solved by storing Personally Identifiable Information (PII) not in the events themselves, but in another database or storage solution (possibly encrypted, see below).
Whenever an event is read, the associated PII is then retrieved and joined to the event by the system, before handing the event to a listener.
While I don’t agree with the assertion of this being a non-issue, on it’s own, this idea seems to have some merit. I do wonder about the practical implementations though. Joining PII to events as they are loaded could have performance implications. Plus, depending on the domain, individual events may contain significant amounts of PII. I’ve worked on projects where moving PII out of the events would have resulted in very sparse, even skeletal, events.
3. Event stream per customer
Some event sourcing frameworks (such as Prooph) support creating an event stream per aggregate (root). When an Article 17 request is received, getting rid of the events associated with the requestor is as simple as identifying the involved aggregate identifiers and removing the associated streams. Conceptually, this is no different from just removing all the events for those aggregates though.
4. Using the event store as evidence
Not directly linked to Article 17 specifically, or the GDPR in general, but an interesting suggestion nonetheless. A short search returned no (immediate) results, but it got me thinking.
In general, an event store can be used as (the basis of) an audit log, but mostly for internal bookkeeping. I’m no legal expert, but for it to be usable in court, I suspect we’d ways of proving that events were not tampered with. This probably means enforcing immutability in some way (using a WORM drive, perhaps?), signing each event or something similar.
5. Crypto trashing / crypto shedding / crypto locking
Three names for the same technique: encrypting PII using a unique key and forgetting that key whenever an Article 17 request is received, thereby rendering the PII unreadable (and effectively removed). This approach is a useful alternative if immutability is a concern (or an architectural choice, such as when using Kafka or a similar system).
It looks like AxonIQ is starting to offer a module around this very concept. I haven’t been able to play around with it yet, though.
An important aspect of this idea is key management: storing, finding and retrieving the right encryption key associated with a particular piece of information. This becomes especially interesting at scale, with a large number of consumer-like aggregates.
Your suggestion here?
I’m sure there are some additional ideas and solid arguments. The combination of the GDPR and event sourcing makes for an interesting problem space. Let me know!