The following piece of code doesn’t return exactly what I am trying to compute; the number of unique users. Any idea?
data = LOAD 'input_initial' AS (user_id,item_id,rating,timestamp);
data = FOREACH data GENERATE user_id,item_id;
STORE data INTO 'input_final';
data_users = FOREACH data GENERATE user_id;
group_users = GROUP data_users BY user_id;
count_users = FOREACH group_users GENERATE COUNT(data_users);
STORE count_users INTO 'count_users';
You need to amend the final GROUP operation to act on ‘all’ rather than an individual field: