My Mac app keeps a collection of objects (with Core Data), each of which has a cover image, and to which I assign a UUID upon creation. I had originally been storing the cover images as a field in my Core Data store, but recently started storing them on disk in the file system, instead.
Initially, I’m storing the covers in a flat directory, using the UUID to name the file, as below. This gives me O(1) fetching, as I know exactly where to look.
...
/.../Covers/3B723A52-C228-4C5F-A71C-3169EBA33677.jpg
/.../Covers/6BEC2FC4-B9DA-4E28-8A58-387BC6FF8E06.jpg
...
I’ve looked at the way other applications handle this task, though, and noticed a multi-level scheme, as below (for instance). This could still be implemented in O(1) time.
...
/.../Covers/A/B/3B723A52-C228-4C5F-A71C-3169EBA33677.jpg
/.../Covers/C/D/6BEC2FC4-B9DA-4E28-8A58-387BC6FF8E06.jpg
...
What might be the reason to do it this way? Does OS X limit the number of files in a directory? Is it in some way faster to retrieve them from disk? It would make the code used to calculate the file’s name more complicated, so I want to find out if there is a good reason to do it that way.
On certain file systems (and I beleive HFS+ too), having too many files in the same directory will cause performance issues.
I used to work in an ISP where they would break up the home directories (they had 90k+ of them) Using a multi-directory scheme. You can partition your directories by using, say, the first two characters of the UUID, then the second two, eg:
That way you don’t need to calculate any extra characters or codes, just use the ones you have already to break it up. Since your UUIDs will be different every time, this should suffice.