We have had a request to provide some data to an external company.
They require only a sample of data, simple right? wrong.
Here is their sampling criteria:
Total Number of records divided by 720 (required sample size) – this gives sampling interval (if result is a fraction, round down to next whole number).
Halve the sampling interval to get the starting point.
- Return each record by adding on the sampling interval.
EXAMPLE:
- 10,000 Records – Sampling interval = 13 (10,000/720)
- Starting Point = 6 (13/2 Rounded)
- Return records 6, 19 (6+13), 32 (19+13), 45 (32+13) etc…..
Please can someone tell me how (if) something like this is possible in SQL.
If you have use of ROW_NUMBER(), then you can do this relatively easily.
ROW_NUMBER()gives all your data a sequential identifier (this is important as the id field must both be unique and NOT have ANY gaps). It also defines the order you want the data in(ORDER BY a, b, c, d).With that id, if you use Modulo (Often the
%operator), you can test if the record is the 720th record, 1440th record, etc (because 720 % 720 = 0).Then, if you offset your id value by 360, you can change the starting point of your result set.
EDIT
After re-reading the question, I see you don’t want every 720th record, but uniformly selected 720 records.
As such, replace
720with(SELECT COUNT(*) / 720 FROM yourTable)And replace
360with(SELECT (COUNT(*) / 720) / 2 FROM yourTable)EDIT
Ignoring the rounding conditions will allow a result of exactly 720 records. This requires using non-integer values, and the result of the modulo being less than 1.