I am working on a web application, which historically was built on a PHP/MySQL stack.
One of they key operations of the application had to do some heavy calculations which required iterating over every row of an entire DB table. Needless to say this was a serious bottleneck. So a decision was made to rewrite the whole process in Java.
This gave us two benefits. One was that Java, as a language, was much faster than a PHP process. The second one was that we could maintain the entire data set in the Java application server memory. So now we can do the calculation-heavy operations in memory, and everything happens much faster.
This worked for a while, until we realized we need to scale, so we now need more web servers.
Problem is – by current design, they all must maintain the exact same state. They all query the DB, process the data, and maintain it in memory. But what happens when you need to change this data? How do all the servers maintain consistency?
This architecture seems flawed to me. The performance benefit from holding all the data in memory is obvious, but this seriously hampers scalability.
What are the options from here? Switch to a in-memory, key-value, data store? Should we give up holding state inside the web servers entirely?
now switch to Erlang 🙂
yeah, that’s a joke; but there’s a grain of truth. the issue is: you originally had your state in an external, shared repository: the DB. now you have it (partially) precalculated in an internal non-shared repository: Java RAM objects. The obvious way is to have it still precalculated but in an external shared repository, the faster the better.
One easy answer is memcached.
Another is to build your own ‘calc server’, which centralizes both the calculation task and the (partial) results. The web frontend processes just access this server. In Erlang it would be the natural way to do it. In other languages, you sill can do it, just more work. Check ZeroMQ for inspiration, even if you don’t use it in the end (but it’s a damn good implementation).