I am calling a UDF written in Java from a Pig script.
In the UDF if for some reason the input is not proper I return null and that particular row/line is skipped.
Now there are many reason for which I could have skipped the current line/row. I am current using the following log statement in my UDF
warn("XML is null, so skipping it", PigWarning.UDF_WARNING_1);
....
warn("Entity is null, so skipping it", PigWarning.UDF_WARNING_5);
.... and so on
Once the Pig script is done, this give me a consolidate info like below
2013-01-21 07:03:42,163 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_5 5473 time(s).
2013-01-21 07:03:42,163 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_1 1466 time(s).
But now I want to know the reason why each line failed, instead of just the numbers. Is there any way to do this in Pig?
I found out a library called penny which allowed me to do logging and filtering in Pig.