I am wondering how PigStorage in Pig stores data to S3? Does it save output to HDFS and then copy them over? Or saving each reducer output to local directory of each reducer and then copying them over to S3? I guess this can’t be streaming since S3 supports only putting files or a directory?
Share
My understanding is that each reducer writes its output locally and then copies the output to S3.
As you have correctly stated – since S3 doesn’t support streaming, the reducer can only copy its output once it has finished processing.