In SQL (specifically MySQL, but the question is generic enough), what is the most efficient way to query time-series data when I have multiple tables across disjoint time ranges? For example, if my tables are as follows:
router1_20090330( unixtime integer unsigned, iface1_in integer unsigned, iface1_out integer unsigned ) router1_20090331( unixtime integer unsigned, iface1_in integer unsigned, iface1_out integer unsigned )
and so on, with (say) 1 minute samples of the in and out traffic on interface1, giving 86400 records per day table.
I want the query to represent something like
SELECT CAST(unixtime/3600 as unsigned) AS hour, sum(iface1_in), sum(iface1_out) from router1_20090330 *then* router1_20090331 group by hour order by hour;
i.e. I would like both tables to be queried and aggregated, but one after the other instead of being joined.
You want to UNION the two (or more) tables:
Note that the ‘ALL’ after UNION tells the query to keep duplicate rows (if the same values appear in both tables – unlikely in your scenario). I usually use UNION ALL because I think it avoids the overhead of testing the data for duplicate rows.