Mongo newbie running 32-bit Mongo 2.0.1 on Windows XP. No option at present to run 64-bit.
I want to use Mongo to map-reduce a set of 60 files, each a monthly snapshot of the same 20,000 row x 100 column table. Ideally I’d put them all in one collection – and although this comes to about 1.2GB of csv data, it’s well over 2GB of mongoimport’ed data.
Question: should I program around this (60 map reduces) or is there a not-too-cumbersome engineering solution that a Mongo newbie could tackle (either w sharding or some trick that results in less bloated storage given that the underlying data is really a simple table w defined columns).
You can go over the 2Gb limit by sharding. The 2Gb limit applies to individual mongod processes, rather than the total data in the sharded dataset.
Here is some documentation about getting started with sharding. There is also a Python script for setting up a sharded environment on a single machine. I assume it will work on Windows.