I’m trying out the mapreduce framework from (http://code.google.com/p/appengine-mapreduce/) and modified the demo application a bit (use the mapreduce.input_readers.DatastoreInputReader instead of mapreduce.input_readers.BlobstoreZipInputReader).
I’ve set up 2 pipeline classes:
class IndexPipeline(base_handler.PipelineBase):
def run(self):
output = yield mapreduce_pipeline.MapreducePipeline(
"index",
"main.index_map", #added higher up in code
"main.index_reduce", #added higher up in code
"mapreduce.input_readers.DatastoreInputReader",
mapper_params={
"entity_kind": "model.SearchRecords",
},
shards=16)
yield StoreOutput("Index", output)
class StoreOutput(base_handler.PipelineBase):
def run(self, mr_type, encoded_key):
logging.info("output is %s %s" % (mr_type, str(encoded_key)))
if encoded_key:
key = db.Key(encoded=encoded_key)
m = db.get(key)
yield op.db.Put(m)
And run it with:
pipeline = IndexPipeline()
pipeline.start()
But I keep getting this error:
Handler yielded two: ['a'] , but no output writer is set.
I’ve tried to find somewhere in the source where to set the output writer but with out success. Only thing I found is that one should set a output_writer_class somewhere.
Does anyone know how to set this?
On a side note, the encoded_key argument in StoreOutput always seems to be None.
Output writer must be defined as argument of mapreduce_pipeline.MapreducePipeline (cf. docstring) :