Event versioning (or why I sometimes modify the event store)

Requirements and applications evolve and change, leading to refactoring. For example, user registration now requires a first and last name, what was once an error may no longer be considered one, etc. In an event sourced application, that poses a few problems. In this post I’ll discuss a few strategies and my views on event versioning.

On immutability and legacy events

Consider legacy events that, based on new requirements, are:

  • No longer relevant;
  • Misnamed;
  • Missing information;
  • Too coarse-grained (high level) or too fine-grained (low level).

Some people argue that event stores in general, and events in particular, are immutable. That modifications to existing events should not happen, and any corrections must be made by new events. After all, events are a record of history, and history does not change.

This has some implications when it comes to the mutations listed above. Regardless of new versions of events, to prevent errors and inconsistent state, old events and their handlers need to be maintained indefinitely. Potentially, this also holds for event listeners that depend on the old events.

Needless to say, depending on the rate of change of the application and its events, this can dramatically increase technical debt and complexity.

Archiving events

Consider a domain where aggregates have a limited “lifespan”. For example, a site with classifieds that are automatically removed after a certain period of time. Once these aggregates have become inactive (i.e., after applying a Deleted event), they will not process new commands or apply events.

Following that train of thought, all existing events for those aggregates can be safely moved to a secondary (archival) event store. This satisfies immutability (nothing is removed or modified), but it also ensures that the events do not take up memory, index space, etc., reducing the working set.

An alternative approach would be to move those events to inexpensive, slow (but replicated) storage. That way, the events will remain available, but with significantly increased latency.

Event upcasting

Upcasting satisfies the requirement for an immutable event store. An upcaster is called when reading (old) events from a stream or store, before these events are pushed to event listeners, projections, etc. The original events are not touched, but the upcaster can map our new understanding of the world on those events.

In general, I consider an upcaster to be some method f that takes an event of a revision x, and turns it into zero or more events of revision x + 1. In other words, Event_V2 = f(Event_V1). It’s not something that can be fully automated, instead we will have to write a little code.

Let’s assume that, during user registration, an event was applied containing the date and time of registration. After some iterations, it’s decided that the time component is no longer of interest, just the date component:

In this case, an event listener interested in UserRegistered events will only have to handle the last version of an event (UserRegistered_V2). They will never see UserRegistered_V1. This reduces the amount of code that has to be supported.

Of course, upcasters do have a performance impact, depending on the number and type and complexity of the work they do. For example, to construct a new version of an event, an upcaster might need to pull in data from other events or data sources.

Additionally, be aware that existing projections or subscribers are not automatically updated (only during replays).

Framework support

Here’s an overview of how upcasting can be implemented in some of the popular event sourcing frameworks:

  • Axon Framework: has native support for Event Upcasters, which can also (de-)multiplex events (changing the granularity, i.e. splitting or merging). No longer relevant events can be converted to the empty list.
  • Akka Persistence: by writing a custom Event Adapter, upcasting can be implemented in the fromJournal method; (de-)multiplexing is possible.
  • Broadway: no (native) support for upcasting, there is however a pull request to add that to the library.
  • Prooph: upcasting can be achieved by using the UpcastingPlugin. (De-)multiplexing is supported. (Updated Nov 2017)
  • Event Store (not really an event sourcing framework, but rather a journal): other than copying all events from a stream to a new stream, updating each event as it is copied, there seems to be no built-in support for event versioning or upcasting. Potentially, you could do some transformation when reading events (in a query). It seems that code would have to be repeated for every query that deals with those events.

Let me know if there are frameworks missing from this list!

Rewriting events

This is where it might become a little controversial 😉

I’m a practical person. My personal view is that there are cases where directly modifying or deleting events is a perfectly valid and acceptable scenario. Such modifications can be done by directly accessing the database (for trivial cases), or programmatically:

  1. Load the events that need to be refactored.
  2. De-serialize event payload.
  3. Modify event payload.
  4. Serialize event payload.
  5. Write modified event back to event store.

As with all powerful tools, there are some inherent issues and dangers:

  • Running code that depends on the old structure of the events that are modified. This can be mitigated by performing the refactoring on application startup (but that means downtime), or saving the events as a new collection (and then switching to that collection at some point).
  • Removing or renaming the wrong events.
  • Breaking serialization.
  • Like with upcasting, existing projections or read models are not automatically updated.

So use with caution, but do consider it a part of your toolkit.

I’m keen to hear your thoughts!

Michiel Rook

Michiel Rook

Michiel Rook is a Java/PHP/Scala consultant from the Netherlands. He loves coaching teams to develop better software and implement continuous deployment. He is a co-founder of Make.io and a member of the Dutch Web Alliance. When he’s not thinking about continuous deployment, devops or event sourcing he enjoys music, cars, sports and movies.

6 thoughts on “Event versioning (or why I sometimes modify the event store)

  • Pingback: Forget me please? Event sourcing and the GDPR - Michiel Rook's blog

  • November 11, 2017 at 11:21 am
    Permalink

    Hi Michiel, thanks for this analysis, this also worried me, what to do when I change the event structure.

    Maybe I still don’t get it, does the Upcaster regenerate all events in event store to new version?

    Imagine simplified event store table (one event chain execution):
    id, eventname, version
    1, ProcessCreated, 1
    2, ProcessFinished, 1

    Now I am going to change ProcessCreated event to version 2. So I will end up with:
    id, eventname, version, date
    1, ProcessCreated, 1,
    2, ProcessFinished, 1
    3, ProcessCreated, 2
    4, ProcessFinished, 1

    Now, I don’t want to mess the source code with any kind of versioning of classes/methods, something like ProcessFinishedHandler_V2. Instead of I would prefer to recreate the obsolete versions of events:
    1, ProcessCreated, 1,
    2, ProcessFinished, 1
    3, ProcessCreated, 2
    4, ProcessFinished, 1
    5, ProcessCreated, 2
    6, ProcessFinished, 1

    Question: How to get into this state, is there some automagical tool? I would be happy to see, that once I write something like Upcaster if I understand it is able to “upgrade” previous event structure to new one and once I deploy the code, there is some mechanism fully automated which get me into this phase, but I am not sure how to achieve this in clean way, any hints therefore welcomed.

    Ultimately I would like to take out of the store the original stream with obsolete event version into archive:
    1, ProcessCreated, 1,
    2, ProcessFinished, 1

    By leaving the store in upgraded history version:
    3, ProcessCreated, 2
    4, ProcessFinished, 1
    5, ProcessCreated, 2
    6, ProcessFinished, 1

    You mentioned this approach, but it is not clear if and how I can do regeneration of events in fully automated way.

    Regarding the frameworks you mentioned there are 2 I am looking into as well:
    Eventuate
    https://github.com/RBMHTechnology/eventuate

    Light Eventuate 4j
    https://github.com/networknt/light-eventuate-4j

    Thanks for this blog again, useful information.

  • November 11, 2017 at 11:24 am
    Permalink

    Another personal question, we distribute application logic and processing with microservices, but when introducing event log, we again create single point of truth and failure in the system.

    How do you solve this in your world? Clustered RDBMS backends, clustered Kafka? Its easy now to scale microservices, but how to scale the easily the Event Store? :-)))

  • Michiel Rook
    November 14, 2017 at 2:17 pm
    Permalink

    Hi Ladislav,

    I wouldn’t share event stores between services. Each service should be isolated and not share (data) storage with another service. To communicate between services you can use APIs (but that creates hard dependencies) or a messaging bus, like Kafka. You’d probably then put fat (comprehensive) events on the bus, rather than the thin, lifecycle events you maintain within your service.

  • Michiel Rook
    November 14, 2017 at 2:24 pm
    Permalink

    When you use upcasters, you will see different versions of events in your event store (after all, events aren’t modified). Your code (event handlers and such) however will only see the latest version, because those events pass through one or more upcasters first.

    To upgrade the entire store, you can simply loop over all the events, apply the upcaster(s) to those events, and write the result to a new event store/stream. I’m planning to write an article about the pros/cons of both approaches.

  • Pingback: Upcasters or a versioned event store: pros and cons - Michiel Rook's blog

Leave a Reply

Your email address will not be published. Required fields are marked *