I’m using PHP to make a simple caching system, but I’m going to be caching up to 10,000 files in one run of the script. At the moment I’m using a simple loop with
$file = "../cache/".$id.".htm";
$handle = fopen($file, 'w');
fwrite($handle, $temp);
fclose($handle);
($id being a random string which is assigned to a row in a database)
but it seems a little bit slow, is there a better method to doing that? Also I read somewhere that on some operating systems you can’t store thousands and thousands of files in one single directory, is this relevant to CentOS or Debian? Bare in mind this folder may well end up having over a million small files in it.
Simple questions I suppose but I don’t want to get scaling this code and then find out I’m doing it wrong, I’m only testing with chaching 10-30 pages at a time at the moment.
Remember that in UNIX, everything is a file.
When you put that many files into a directory, something has to keep track of those files. If you do an :-
You’ll probably note that the ‘.’ has grown to some size. This is where all the info on your 10000 files is stored.
Every seek, and every write into that directory will involve parsing that large directory entry.
You should implement some kind of directory hashing system. This’ll involve creating subdirectories under your target dir.
eg.
/somedir/a/b/c/yourfile.txt
/somedir/d/e/f/yourfile.txt
This’ll keep the size of each directory entry quite small, and speed up IO operations.