So I have these specific columns that I’m working with:
customer_token, merchant_id, merchant_category_code, and transaction_amount.
My current query is this:
SELECT customer_token, COUNT(transaction_amount), SUM(transaction_amount)
FROM transaction
WHERE file_date>20121031
and file_date<20121201
GROUP BY customer_token
I want add on to the above query a part where in the result, the merchant_category_code is separated into different columns based on the transaction amount in each specific merchant_category_code. A result that would look something like this:
customer_token, count(transaction_amount),sum(transaction_amount), count(transaction_amount in merchant_category_code which is ranked 1), count(transaction_amount in merchant_category_code which is ranked 2), count(transaction_amount in merchant_category_code which is ranked 3), etc…
and then this:
customer_token, count(transaction_amount),sum(transaction_amount), sum(transaction_amount in merchant_category_code which is ranked 1), sum(transaction_amount in merchant_category_code which is ranked 2), sum(transaction_amount in merchant_category_code which is ranked 3), etc…
But I’m at a loss on how to do this or if it is even at all possible.
If you know in advance what the possible values of
merchant_category_codeare, you can useCASEexpressions:(or
IFexpressions, if you prefer; for documentation on both of these, see the section titled “Conditional Functions” on the page “LanguageManual+UDF” in the Hive wiki).