This is related to another StackOverflow question from a year ago. But, a bit different.
This is in Ruby/Mongoid: 2.2.6.
When I got going with MongoDB, it looks like we stored the ID of an MongoDb document incorrectly, because a simple find on an ID is not working, yet it’s returning the document when we run a where query on other attributes.
I’ve tried “resetting” the id field by using the object returned from the where and setting the “id” and the “_id” to a BSON::ObjectId version of the stored string. This did not work, as the then record is still not queryable by this field.
Any other suggestions before I just completely wipe the disk (losing months of production data) and starting over?
Edit: An example of a document that is retuned in the loop that is not retrievable.
{"_id"=>"4f47267193546d160b0171a2", "attribute_tags"=>[{"tag"=>"website"}, {"tag"=>"twitter"}, {"tag"=>"website"}, {"tag"=>"twitter"}], "contact_info"=>{"facebook"=>[], "success"=>true, "created_at"=>2012-02-24 05:58:06 UTC, "tags"=>[], "twitter"=>[], "email"=>[], "phone"=>[], "linkedin"=>[], "google_plus"=>[], "youtube"=>[], "contact_form"=>false}, "created_at"=>2012-02-24 05:56:01 UTC, "data"=>{"twitter_followers_count"=>112, "twitter_is_translator"=>112, "twitter_protected"=>false, "twitter_url"=>"http://www.bettyunderground.com", "twitter_verified"=>false, "twitter_statuses_count"=>2040, "twitter_listed_count"=>14, "twitter_geo_enabled"=>true, "twitter_friends_count"=>124, "twitter_created_at"=>"Fri Jul 17 21:41:00 +0000 2009", "twitter_contributors_enabled"=>false, "enriched_at"=>2012-02-24 05:58:09 UTC}, "demographics"=>{}, "description"=>"The trials and tribulations of a polemicist", "directory_ids"=>[], "forums"=>[], "found_at_url"=>"http://www.bettyunderground.com", "geographics"=>{"language"=>"en", "location"=>"San Francisco, CA"}, "hashtags"=>{"tag"=>{"website"=>true, "twitter"=>true}, "reachable_via"=>{"twitter"=>true}}, "host_names"=>[], "ignore_project_ids"=>[], "keyword_scores"=>{"return policy"=>0.0}, "keywords"=>["return policy"], "last_contact_info_update"=>2012-02-24 05:58:09 UTC, "name"=>"Betty Underground", "new_profiles"=>[{"service"=>"twitter", "user_id"=>"BettyUndergrnd", "score"=>1.0}, {"service"=>"twitter", "username"=>"BettyUndergrnd", "score"=>1.0}], "presence_score"=>0, "profile_url"=>"http://a2.twimg.com/profile_images/1459407098/image_normal.jpg", "profiles_retrieved"=>true, "references"=>[], "share_counts"=>{}, "tags"=>["website", "twitter"], "twitter"=>"BettyUndergrnd", "updated_at"=>2012-03-17 10:08:09 UTC, "wordsmaster_ids"=>[], "reachable_via"=>[], "read_project_ids"=>[]}
It doesn’t have a ObjectId for the ID field. Not sure how it got busted this way, but that’s the way it is.
The code I’m using to modify it is:
#if d is the document
old_id = d._id
d["_id"] = BSON::ObjectId(old_id)
d.save
I have placed a gist of doing this from my console. You can see exactly what I’m doing.
Any thoughts would be appreciated.
In Mongoid, there is a rake task to convert the ObjectIds.
If you use this, you’ll have a mirror of your collection. Then, simply rename and you’ll be set.
It will error if you have duplicate object_ids though, so you might need to run it a few times.
And, it’s SLOW.