This is the below data in my Table1
BID PID TIME
---------+-------------------+----------------------
1345653 330760137950 2012-07-09 21:42:29
1345653 330760137950 2012-07-09 21:43:29
1345653 330760137950 2012-07-09 21:40:29
1345653 330760137950 2012-07-09 21:41:29
1345653 110909316904 2012-07-09 21:29:06
1345653 221065796761 2012-07-09 19:31:48
So If I need to clarify the above scenario- I have data in above table like this-
For USER 1345653 I have this PID 330760137950 four times but with different timestamps. So I need the output something like this-
Output that I need:-
1345653 330760137950 2012-07-09 21:43:29
1345653 330760137950 2012-07-09 21:42:29
1345653 330760137950 2012-07-09 21:41:29
1345653 110909316904 2012-07-09 21:29:06
1345653 221065796761 2012-07-09 19:31:48
So Basically If BID and PID are same but with different timestamps, then I need TOP 3 of those sorted with TIME in descending order
And for this I created rank UDF (User Defined Function) in Hive. And I wrote the below query but its not working for me. Can anyone help me on this?
SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)
FROM (
SELECT bid, pid, time
FROM table1
where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = '2012-07-09'
DISTRIBUTE BY bid,pid
SORT BY bid, time desc
) a
WHERE rank(bid) < 3;
So with above query I am getting output like this
1345653 330760137950 2012-07-09 21:43:29
1345653 330760137950 2012-07-09 21:42:29
1345653 330760137950 2012-07-09 21:41:29
which is wrong as I am missing last two rows of the Expected Output above. Can anyone help me with this?
oh i’m in sql server. i don’t think you are……..
anyway. my recommendation is that you move your rank function inside of the nested select you have. in the outside select you want it where it is less than three… i don’t know your syntax. i shouldn’t have answered this question. sorry…. lol
here:
http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
your rank() is in the outer select… it needs to be in the inner. leave the < 4 or <= 3 or whatever in the outer where statement, though. your query almost looks exactly like that example… just needs a few changes.
based on the link and my absolute LACK of knowledge of Hive… i think you might want this:
and i can’t test or compile because honestly i had no clue what hive was before you posted your question. (small world, i know, so sad – so true)