http://code.flickr.com/blog/page/4/
This blog post is from the devs at Flickr, and outlines their simplified approach to generating GUIDs for photos in a sharded database environment using mysql.
I am working on an app that uses MongoDB for data store that has a similar requirement for items stored in embedded documents. Basically, a document in the collection represents a list of items, and then individual items inside that document each need to have some kind of identifier as well for lookup purposes. I’d rather not put items in a different collection since the list keys that aren’t items are really just metadata and don’t need to have their own collection. Ideally it should be one document.
I was thinking the kind of approach detailed in the blog post could be implemented to solve this problem – one endpoint that generates GUIDs for these entries and saves the last used value. The problem is that I am not certain if this approach introduces problems when sharding the data store in mongo. I don’t have any experience distributing Mongo over several machines. I assume I could have the application layer check this endpoint when the data is saved and set the _id key appropriate, but I don’t know how this would affect queries against the data set.
Would be setting up this kind of GUID system be a flawed idea? I realize it runs counter to some of the principles of NoSQL in general, but since the documents are embedded, what alternative is there?
I think ObjectID is the way to go. They are stored much more compactly than GUID/UUID and maintain a roughly increasing order which has benefits for indexing. It is also designed to be generated client-side without the need for a ticket server as described in the article. The only real downside vs their solution is that they use 12 bytes while an int64 uses 8 (GUIDs/UUIDs use 16 in binary or 32 in hex plus a few bytes of overhead). One other potential downside (which is more likily to be a benefit in most cases) is that because the creation time is encoded in the ObjectId if they are used for publicly visible identifiers it can leak possibly unwanted information to users such as when another user signed up for your service.