from the app-engine mapreduce console (myappid.appspot.com/mapreduce/status)
I have a mapreduce defined with input_reader: mapreduce.input_readers.BlobstoreLineInputReader
that I have used successfully with a regular blobstore file, but it doesn’t work with a Blobkey created from cloud storage with create_gs_key. when I run it, I get the error “BadReaderParamsError: Could not find blobinfo for key THEKEY”. The input reader checks for the existence of a BlobInfo. Is there any work around to this? shouldn’t BlobInfo.get(BLOBKEY FROM CS) return a blobinfo?
to get a blob_key from a google cloud storage file, I run this:
from google.appengine.ext import blobstore
READ_PATH = '/gs/mybucket/myfile.json'
blob_key = blobstore.create_gs_key(READ_PATH)
print blob_key
A community member created a LineInputReader for Cloud Storage as an issue on the appengine-mapreduce library: http://code.google.com/p/appengine-mapreduce/issues/detail?id=140
We’ve posted our modifications here: https://github.com/thinkjson/CloudStorageLineInputReader
We’re using this to do MapReduce over about 4TB of data, and have been happy with it so far.