Simple question: I have a module headers.py which defines a couple variables I need in my main MRJob script. I should be able to run the job with
python MRMyJob -r emr --file=headers.py s3://input/data/path
and then in my MRJob script (MRMyJob), the following should work:
from headers import header1, header2, header3
Right? From the mrjob –help page: “–file=UPLOAD_FILES
Copy file to the working directory of this script. You
can use –file multiple times.”
I’m still getting “no module named headers” when I try to import it.
headers.pyis apparently not put in your remotePYTHONPATH. See the docs on how to get additional modules across to the cluster; you have to put them in a tarball first.