So I am working on a pet project where I’m storing various text files. I have setup my app to save the tags as a string in one of my collections so an example would be:
tags: “Linux Apache WSGI”
Storing them and searching for them work just fine but my question comes when I want to do something like a tag cloud, count all the various tags, or make a dynamic selection system based on tags, what is the best way to break them up to work with? Or should I be storing them some other way?
Logically I could scan through every record and get all the tags, break them based on space, then cache the result somehow. Maybe that’s the right answer but I wanted to ask the community wisdom.
I’m using pymongo to interact with my database.
The standard way to store tags is to store them as an array. In your case, the DB would look something like:
This is what Map/Reduce is designed for. This effectively “scans every record”. The output of a Map/Reduce is another collection that you can query.
However, there’s also another way to do this and that’s to keep “counters” and update them. So when you save a new document you also increment all of the tags related to that document.