My question is regarding aggregated data for fast access across several servers on Amazon EC2. In an ASP.NET application, I would probably store that data in Application[“somevar”] variable so it can be accessed quickly (in memory) by all users.
The problem starts when I want that aggregated data to be gathered and its value equal on all servers. If I chose to deploy two servers, the user might be transmitting data to different servers every time (the servers are under a load balancer or ElasticBean), and if for example I count the number of times the user asked for the page, each server’s Application var will have different value
For example:
Server 1:
Application[“counter1”] = 120
Server 2:
Application[“counter1”] = 130
What I want is a variable that be the same on all servers. The reason I want the data in Application-like variable is that I want that data in memory for fast access, then I might write that data to the database.
What I want to know is how can I achieve this. I though about using Amazon ElasticCache so even if I have 10 server under the load balancer, I can access the ElasticCache variable via API and it doesn’t matter from which server I access the memcache variable, it will get/set the same variable, and therefore I can achieve my goal in keeping a cross-server global variable.
I wanted to know if it’s a good practice and wherever there is a better way to implement such feature.
I am developing my application in ASP.NET C# and with MySQL. Also take into consideration that some of the aggregated data should be written to the database, and I do that to prevent a lot of writes at the same time, and write data after it reaches 20 writes for example and then the data will be written to the database.
Just to clear up a few things. First lets make sure that we understand how to use ElasticCache. The API for ElasticCache doesn’t give us any CRUD operations on the cache cluster, the API from Amazon is strictly for managing the servers and configuration. You will need to use a memcached library for .NET to connect to the cluster. Using a cache like memcached is a good solution for you’re first problem. It will easily and quickly store simple application variables in a distributed environment. Using a cache is generally a good practice even with smaller applications.
I’m not sure how many users you have or how many you expect to have but one thing I’ve learned in my years programming is that over optimization is usually a bad idea. Over optimization is when you start to optimize you’re code before it’s really necessary. Take you’re proposed optimization for example. We know that making 1 write on the database is quicker than making 20 writes, generally speaking of course. However, unless your database is the bottleneck in your application to implement such a feature you introduce a significant amount of complexity for no immediate benefit. If a memcached cluster server crashes, which it will, then the data waiting to be written to the database is lost. If you really do have a lot of users then you have to start thinking about concurrency and locks on the memcached items.
Without knowing more about your application i can’t make any real recommendations except to say that make sure your optimization are required before you spend time increasing the complexity of your application for nothing.