Here is my CSV file :
12;France;http://www.google.com
12;France;http://www.google.com
For each line, the $field[] structure I use to populate my Mongo
object looks like the following :
array(18) {
["date_day"]=> int(12)
["url"]=> string(21) "http://www.google.com"
["country"]=> string(6) "France"
}
For each of these two lines, I call the following :
$result = $coll->update ( array ( 'ts_day' => $field['date_day'] ,'url'=>$field['url']),array('$addToSet' => array ( "countries" => array ( 'name' =>$field['country'] )) ),array ('upsert' => true ));
$result = $coll->update ( array ( 'ts_day' => $field['date_day'] ,'url' => $field['url'] , 'countries.name' => $field['country'] ),array ( '$inc' => array( 'countries.$.views'=> 1) ) ,array ( 'upsert' => true));
I don’t understand why I get an extra { “name” : “France” } object in
the “countries” array, with no “views” counter, although the counter
is properly incremented in countries.france.views (whose value is 2,
which is correct) :
{ "_id" : ObjectId("4eeb0e6cd86450af72f775be"), "url" : "http://www.google.com", "countries" : [ { "name" : "France", "views": 2 }, { "name" : "France" } ], "ts_day" : 12 }
Any idea for this lonely { “name” : “France” } ? Mongo bug or
misunderstanding of mine ?
Thank you !
It sounds like you’re getting too hung up on using the standard mongoimport/export tools to process your CSV files.
If you check out the source code, you’ll notice that these two tools are rather simple client applications that utilize the MongoDB C++ driver. Were they written in Python or Ruby, they’d like only be 5-10 lines long.
Why don’t you simply write a new script that reads in the CSV file and inserts the documents in exactly the way you’d like? Want to do aggregation? Do it in client code! Want total control over the schema? You’re set!
In fact, this would be the best way forward — mongoimport/export is provided for convenience not comprehensiveness. If your data ingestion process requires significantly updating a large number of documents once they’ve already been written to the database, you may even start to see issues with compaction and performance.