I want to add an embedded Majors array in my MongoDB Colleges collection like so:
{ "_id" : ObjectID("abc123"),
"code" : 123456,
"name" : "Stanford University",
"majors" :["Agriculture", "Business", "Computer Science"... ] }
I have a CSV that, essentially, is a CollegesMajors join table (in the SQL frame of reference). It looks like:
Code—–MajorCode—-Level
123456—-98765———2
123456—-99999———2
How do I embed these majors (I can translate MajorCode to it’s actual name first) into my MongoDB Colleges collection?
There must be some way that I can (pseudo-code):
db.colleges.update { match the code, update with array }
Maybe I need to turn the CSV into a JSON file first?
Thanks in advance!
UPDATE!
This is the error I get when attempting Tilo’s instructions – this is probably related to my lack of familiarity with the rails console. It thinks the CSV.read instruction is loading a nil object.
ruby-1.9.2-head :024 > csvAA = CSV.read( '/Users/administrator/dropbox/schoolninja/1College_Data/2009formatted/college_majors.csv' );
ruby-1.9.2-head :025 > headersA = csvAA.shift;
ruby-1.9.2-head :026 > csvAH = csvAA.map {|row| Hash[*headersA.zip(row).flatten] };
ruby-1.9.2-head :027 > csvAH.each do |rowH|
ruby-1.9.2-head :028 > c = College.find_by_code( rowH['Code'] )
ruby-1.9.2-head :029?> c.majors ||= []
ruby-1.9.2-head :030?> c.majors << ( rowH['title'] )
ruby-1.9.2-head :031?> c.save
ruby-1.9.2-head :032?> end
SyntaxError: (irb):24: unknown regexp options - adtratr
(irb):24: syntax error, unexpected tCONSTANT, expecting ')'
...opbox/schoolninja/1College_Data/2009formatted/college_majors...
... ^
(irb):24: syntax error, unexpected tIDENTIFIER, expecting $end
...nja/1College_Data/2009formatted/college_majors.csv );
... ^
from /Users/administrator/.rvm/gems/ruby-1.9.2-head/gems/railties-3.0.9/lib/rails/commands/console.rb:44:in `start'
from /Users/administrator/.rvm/gems/ruby-1.9.2-head/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in `start'
from /Users/administrator/.rvm/gems/ruby-1.9.2-head/gems/railties-3.0.9/lib/rails/commands.rb:23:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'
UPDATE:
Please check out the new Ruby Gem "smarter_csv" in case you want to update MongoDB from records of CSV-files. It has lots of helpful features, including chunking of large files, and returns the results in chunks as arrays of hashes.
See also:
Updated Answer:
Previous Answer:
Assuming you already have a method majorcode_to_name() :
If your CSV file is really large, you may want to modify this code a little, so you don’t have to read the whole data into RAM
EDIT:
I should have mentioned why I converted the result of CSV.read into an array of hashes.. The main reason is to make the code independent of the order of the headers in the CSV file – which makes the code a bit more robust.