We have a very expensive calculation that we’d like to cache. So we do something similar to:
my $result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now, during calculate($key), before we store the result in the cache, several other requests come in, that also start running calculate($key), and system performance suffers because many processes are all calculating the same thing.
Idea: Lets put a flag in the cache that a value is being calculated, so the other requests just wait for that one calculation to finish, so they all use it. Something like:
my $result = $cache->get( $key );
if ($result) {
while ($result =~ /Wait, \d+ is running calculate../) {
sleep 0.5;
$result = $cache->get( $key );
}
} else {
$cache->set( $key, "Wait, $$ is running calculate()", '10 minutes' );
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now that opens up a whole new can of worms. What if $$ dies before it sets the cache. What if, what if… All of them solvable, but since there is nothing in CPAN that does this (there is something in CPAN for everything), I start wondering:
Is there a better approach? Is there a particular reason e.g. Perl’s Cache and Cache::Cache classes don’t provide some mechanism like this? Is there a tried and true pattern I could use instead?
Ideal would be a CPAN module with a debian package already in squeeze or a eureka moment, where I see the error of my ways… 🙂
EDIT: I have since learned that this is called a Cache stampede and have updated the question’s title.
flock()it.Since your worker processes are all on the same system, you can probably use good, old-fashioned file locking to serialize the expensive
calculate()ions. As a bonus, this technique appears in several of the core docs.Benefit: worker death will instantly release the
$lock.Risk: LOCK_EX can block forever, and that is a long time. Avoid SIGSTOPs, perhaps get comfortable with
alarm().Extension: if you don’t want to serialize all
calculate()calls, but merely all calls for the same$keyor some set of keys, your workers canflock()/some/lockfile.$key_or_a_hash_of_the_key.