I am using HIVE with two tables looking like (more or less):
-TABLE1 defined as [(Variables : string),(Value1 : int),(Value2 : int)]
with field “Variables” looking like “x0,x1,x2,x3,…,xn”
-TABLE2 define as [(Value1Sum : int),(Value2Sum : int),(X1 : string),(X4 : string),(X17 : string)]
I “convert” table1 to table2 with the query :
INSERT OVERWRITE TABLE table2
SELECT sum(v1), sum(v2), x1, x4, x17
FROM (SELECT
Value1 as v1,
Value2 as v2,
split(Variables, ",")[1] as x1,
split(Variables, ",")[4] as x4,
split(Variables, ",")[17] as x17
FROM Table1) tmp
GROUP BY tmp.x1, tmp.x4, tmp.x17
Does Hive call 3 times the split function ?
Is there a way to make it more elegant ?
Is there a way to make it more generic ?
Best regards,
CC
Yes it will call split each time. You can make it slightly more elegant:
Why not define Variables as an array column to start with? They you can access elements directly:
I’m assuming you’re using an external table, so you can do that like so: