I noticed a number of cases where an application or database stored collections of

Question

0

Asked: May 10, 20262026-05-10T21:15:00+00:00 2026-05-10T21:15:00+00:00

I noticed a number of cases where an application or database stored collections of

0

I noticed a number of cases where an application or database stored collections of files/blobs using a has to determine the path and filename. I believe the intended outcome is a situation where the path never gets too deep, or the folders ever get too full – too many files (or folders) in a folder making for slower access.

EDIT: Examples are often Digital libraries or repositories, though the simplest example I can think of (that can be installed in about 30s) is the Zotero document/citation database.

Why do this?

EDIT: thanks Mat for the answer – does this technique of using a hash to create a file path have a name? Is it a pattern? I’d like to read more, but have failed to find anything in the ACM Digital Library

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T21:15:01+00:00

Hash/B:Tree

A hash has the advantage of being faster to look at when you’re only going to use the ‘=’ operator for searchs.

If you’re going to use things like ‘<‘ or ‘>’ or anything else than ‘=’, you’ll want to use a B:Tree because it will be able to do that kind of searchs.

Directory structure

If you have hundreds of thousands of files to store on a filesystem and you put them all in a single directory, you’ll get to a point where the directory inode will grow so fat that it will takes minutes to add/remove a file from that directory, and you might even get to the point where the inode won’t fit in memory, and you won’t be able to add/remove or even touch the directory.

You can be assured that for hashing method foo, foo(‘something’) will always return the same thing, say, ‘grbezi’. Now, you use part of that hash to store the file, say, in gr/be/something. Next time you need that file, you’ll just have to compute the hash and it will be directly available. Plus, you gain the fact that with a good hash function, the distribution of hashes in the hash space is pretty good, and, for a large number of files, they will be evenly distributed inside the hierarchy, thus splitting the load.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I noticed a number of cases where an application or database stored collections of

Leave an answerCancel reply

1 Answer

Hash/B:Tree

Directory structure

Leave an answer
Cancel reply