I’m reading the Beginning CouchDB book by Apress and there is a line that confuses me a bit:
Also important to note is that CouchDB
will never overwrite existing
documents, but rather it will append a
new document to the database, with the
latest revision gaining prominence and
the other being stored for archival
purposes.
Doesn’t this mean that after a couple of updates, you would have a huge database? Thank you!
The short answer is “not really, no”.
In reality in depends on the average size of your document and the amount of them. This will define when you should be running a compact job on your database, which is the job that removes all of the previous revisions from the database. Read more about compaction at http://wiki.apache.org/couchdb/Compaction
Another sysadmin point for this, try to schedule your compaction jobs when the database isn’t under load. You most specifically care about write load, because if writes are happening too quickly when you run compaction, then your compaction job could (in theory) run forever and take the database with it. However, I’ve seen some not-so-nice behavior around running compaction while under a heavy read load. So, if you can stand only compacting once a day, do it at 3am with the rest of your system/database maintenance cron jobs.
Oh, and possibley most importantly, if you’re just starting to learn couchdb, then it’s probably premature to start worrying about when to run your compaction jobs compared to your system’s load. Premature optimization and all that – focus on other aspects for now.
Cheers.