We are currently evaluating CQRS and Event Sourcing architectures. I am trying to understand what the maintenance implications of using this kind of design are. Two questions I am struggling to find answers to are this:
1)
What happens if, after an application has been up and running for a while, there is a new requirement to add an additional field to a ViewModel on the ReadModel database? Say, the Customer Zip Code is required on the CustomerList ViewModel, where it was not previously. So, the extra column can be added to the ViewModel database easily, but how does this get populated? As far as I can see, the only way is to clear the read database, and replay all the events from scratch to build back up the ReadModel Datbase. But, what if the application has been up and running for months, or years (as we hope it will). This could be millions of events to replay, just to add data for a zipcode column.
I have the same concern if, for whatever technical reason, the ReadModel Database got out of sync, or we want to add a new ReadModel database. It seems like the older the application is, and the more it is used, the harder and more expensive this is get an up to date readmodel back. Or am I missing a trick somewhere? Something like ReadModel snapshots?
2)
What happens if after all the millions of events have been replayed to build back up the read database, some of the data doesn’t line up with what was expected (i.e. it looks wrong). It is thought that perhaps a bug somewhere in the event storing, or denormalizing routines may have caused this (and it seems that if there is one thing you can rely on in coding, it is bugs). How to go about debugging this! It seems like an impossible task. Or maybe, again, I am missing a trick.
I would be interested to hear from anyone who has been running a system like this for a while, how the maintenance and upgrade paths have worked out for you.
Thanks for any time and input.
The beauty of using event sourcing with CQRS is the ability to destroy the read model and rebuild it from scratch, as has been mentioned. For some reason people have this idea that it’s going to take a long time after you get above some arbitrary number of events. If you are using a relational database for your read models–and you most likely are–it’s easy to open up a transaction, read into all of the events through the handlers and then commit the transaction. It’s only when the transaction commits that we actually touch the disk. Everything else is performed in memory so it can be lightning fast. In fact, I wouldn’t be surprised to see your system crank through a few million events in just a few minutes, if that.
Rebuilding your read models from scratch should display the exact same way as your everyday method of denormalizing the events into the read models. If not, you’ve got a bug in your read model denormalization code. The great thing here is that from your message handler perspective there’s no difference between an event being received and denormalized into the read model during regular/production scenarios and for read-model rebuild scenarios.
If you do encounter bugs you can easily debug by streaming/copying the production events to your local workstation, setting breakpoints in your handlers, and then running those events through your read model handling code.