I am trying to drop records that contain at least one null in any of the fields. For example, if the data has 3 fields, then:
filtered = FILTER data by ($0 is not null) AND ($1 is not null) AND ($2 is not null)
Is there any cleaner way to do this, without having to write out 3 boolean expressions?
If all of the fields are of numeric types, you could simply do something like
In Pig, if any terms in an arithmetic expression are null, the result is null.
You could also write a UDF to take an arbitrary number of arguments and return null (or 0, or false, whatever you find most convenient) if any of the arguments are null.
where NUMBER_OF_NULLS is defined elsewhere, e.g.
Note: I have not tested the above UDF, and I don’t claim it adheres to any best practices for writing clear, robust UDFs. You should add exception-handling code, for example.