I have a pig script with code like :
scores = LOAD 'file' as (id:chararray, scoreid:chararray, score:int);
scoresGrouped = GROUP scores by id;
top10s = foreach scoresGrouped{
sorted = order scores by score DESC;
sorted10 = LIMIT sorted 10;
GENERATE group as id, sorted10.scoreid as top10candidates;
};
It gets me a bag like
id1, {(scoreidA),(scoreidB),(scoreIdC)..(scoreIdFoo)}
However, I wish to include the index of items as well, so I’d have results like
id1, {(scoreidA,1),(scoreidB,2),(scoreIdC,3)..(scoreIdFoo,10)}
Is it possible to include the index somehow in the nested foreach, or would I have to write my own UDF to add it in afterwards?
For indexing elements in a bag you may use the Enumerate UDF from LinkedIn’s DataFu project: