My Postgres query calculates statistical aggregate from a bunch of sensor readings:
SELECT to_char(ipstimestamp, 'YYYYMMDDHH24') As row_name,
to_char(ipstimestamp, 'FMDD mon FMHH24h') As hour_row_name,
varid As category,
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(STDDEV(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As StatisticsValue
FROM loggingdb_ips_integer As log
JOIN ipsobjects_with_parent ips ON log.varid = ips.objectid
AND (ipstimestamp > (now()- '2 days'::interval))
GROUP BY row_name, hour_row_name, category;
This works fine as long as I have >1 ipsvalue/hour. If the hourly COUNT(ipsvalue)<2, however, StatisticsValue returns NULL without any Postgres errors.
If I comment out STTDEV, as in the following:
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As value
then all three stats are calculated correctly. I therefore conclude that an illegittimate STDDEV brings down the whole query. I would rather have illegittimate STDDEVs returning 0. I tried to COALESCE the STDDEV line, to no avail. What can be done???
COALESCEshould work.You could also use (it that fits you) the “population standard deviation”
stddev_pop, instead of the “sample standard deviation”stddev_samp; the later is divides byn-1and is aliased toSTDDEV.stddev_pop, instead , divides byn, and it returns zero (instead ofNULL) when given one sample.If you don’t know the difference between these estimators, it’s explained in every statistic textbook, eg http://en.wikipedia.org/wiki/Standard_deviation#Estimation