I have a simple python script (moo.py) that i am trying to stream though
import sys, os
for line in sys.stdin:
print 1;
and i try to run this pig script
DEFINE CMD `python moo.py` ship('moo.py');
data = LOAD 's3://path/to/my/data/*' AS (a:chararray, b:chararray, c:int, d:int);
res = STREAM data through CMD;
dump res;
when i run this pig script local (pig -x local) everything is fine,
but when i run it without -x local, it prints out this error
[main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 2017: Internal error creating job configuration.
[Log file]
Caused by: java.io.FileNotFoundException: File moo.py does not exist.
any idea?
The problem was that i used
ship()function instead ofcache()while
ship()works file – passing local files from the master to the slavescache()is used by the slaves to obtain files from an accessible placesuch as s3 on amazon
hope that helps anyone :]