CouchDB is convenient to develop (CouchApps) locally and then push into remote production. Unfortunately with production-sized data sets, working on views can be cumbersome.
What are good ways to take samples of a CouchDB database for use in local development?
The answer is filtered replication. I like to do this in two parts:
example_dbto my local server asexample_db_fullexample_db_fulltoexample_db, where the filter cuts out enough data so builds are fast, but keeps enough data so I can confirm my code works.Which documents to select can be application-specific. At this time, I am satisfied with a simple random pass/fail with a percentage that I can specify. The randomness is consistent (i.e., the same document always passes or always fails.)
My technique is to normalize the content checksum in the document
_revfield on a range of [0.0, 1.0). Then I simply specify some fraction (e.g.0.01), and if the normalized checksum value is <= my fraction, the document passes.