I have a database, and I use a query to produce an intermediate table like this:
id a b
xx 1 2
yy 7 11
and I would like to calculate the standard deviations of b for the users who have a < avg(a)
I calculate avg(a) that way and it works fine:
select avg(select a from (query to produce intermediate table)) from table;
But the query:
select stddev_pop(b)
from (query to produce intermediate table)
where a < (select avg(select a
from (query to produce intermediate table))
from table);
Returns me an error, and more precisely, I am told that the “a” from avg(select a from…) is not recognised. This makes me really confused, as it works in the previous query.
I would be grateful if somebody could help.
EDIT:
I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem.
The non working query becomes:
select stddev_pop(b) from temp where a < (select avg(a) from temp);
while this works:
select avg(a) from temp;
OK, a colleague helped me to do it. I’ll post the answer in case someone runs into the same problem:
Basically hive doesn’t do caching of a table as a variable.