I’m writing a web application that constantly retrieves XML “components” from a database and then transforms them into XHTML using XSLT. Some of these transformations happen frequently (e.g. a “sidebar navigation” component goes out for the same XML and performs the same XSL transformation on every page that features that sidebar), so I have started implementing some caching to speed things up.
In my current solution, before each component attempts to perform a transformation, the component checks with a static CacheManager object to see if a cached version of the transformed XML exists. If so, the component outputs this. If not, the component performs the transformation and then stores the transformed XML with the CacheManager object.
The CacheManager object keeps an in-memory store of the cached transformed XML (in a Dictionary, to be exact). In my local, development environment this is working beautifully, but I’m assuming this may not be a very scalable solution.
What are the potential downfalls of storing these data in-memory? Do I need to put a cap on the amount of data I can store in an in-memory data structure like this? Should I be using a different data store for this type of caching altogether?
The obvious disadvantage will be, as you suspect, potentially high memory usage by your cache. You may want to implement a system where rarely-used items “expire” out of the cache when memory pressure goes up. Microsoft’s Caching Application Block implements pretty much everything you need, right out of the box.
Another potential (though unlikely) problem you could run into is the cost of digging through the cache to find what you need. At some point it could be faster to just go ahead and generate what you need instead of looking through the cache. We’ve run into this in at least one precise scenario related to very large caches and very cheap operations. It’s unlikely, but it can happen.