I am interested in learning why so many services like Twitter and Facebook name their CDN files the way they do. Looking at http://25.media.tumblr.com/tumblr_m6m6g57NgY1qdhfhho2_1280.jpg I have some observational questions:
- Do they use multiple sub domains (25.media, 26.media, etc.) to offload DNS queries from a single domain? It would seem like storage.tumblr.com would be good enough for all their images since S3 just has the concept of one big bucket.
- Are they inserting a hashed string into the file name to prevent a sequential walk from a web harvesting tool? That seems like a good idea. Take the file name and append some junk to it, hash it, and insert that hash to the tumblr_XXXXXXXXXXXXXXXXXX_1280.jpg file name.
Browsers have limits to how many parallel requests they can make to a single domain, using multiple sub domains means more parallel requests. See: http://yuiblog.com/blog/2007/04/11/performance-research-part-4/
They might be using the seemingly random filenames for the reason you describe. But more likely they are using that to ensure file name uniqueness and too invalidate cache’s if the file changes thereby ensuring that all users are seeing the latest version.