I have table t_stats with column id (INT) and column ratio (DECIMAL(8,4)).
id is unique.
I want to query table t_stats in order to select 3 groups with the same AVG(ratio) (closest possible).
Can be done using temporary tables, as long as I can run it as a script or stored procedure.
EDIT: Here is the concrete example:
INPUT:
id ratio
-- -----
24 0.930000
25 0.390000
26 0.800000
27 0.920000
28 0.550000
30 0.810000
31 0.770000
32 0.800000
33 0.590000
36 0.760000
37 0.910000
40 0.690000
43 0.390000
45 0.310000
46 0.760000
47 0.710000
54 0.710000
55 0.950000
57 0.920000
60 0.890000
62 0.700000
66 0.890000
68 0.950000
107 0.760000
559 0.990000
560 0.540000
565 0.430000
566 0.830000
568 0.590000
579 0.970000
599 0.900000
623 0.450000
749 0.800000
750 0.970000
753 0.820000
754 0.730000
766 0.620000
768 0.430000
770 0.790000
838 0.700000
875 0.835000
987 0.900000
988 0.740000
1157 0.850000
1250 0.630000
1328 0.860000
2171 0.900000
2176 0.520000
2177 0.980000
2178 0.940000
2180 0.970000
2184 0.990000
2187 0.950000
2188 0.940000
2189 0.920000
2195 0.990000
2233 0.900000
2234 0.940000
2235 0.950000
2240 0.980000
2243 0.920000
2253 0.900000
2266 0.530000
2269 0.920000
2270 0.970000
2271 0.750000
2272 0.820000
2275 0.910000
2277 0.930000
2281 0.690000
2282 0.710000
2288 0.840000
2528 0.870000
2778 0.950000
2814 0.990000
OUTPUT:
groupId id ratio
------- -- -----
1 24 0.930000
1 25 0.390000
1 27 0.920000
1 30 0.810000
1 32 0.800000
1 36 0.760000
1 54 0.710000
1 60 0.890000
1 559 0.990000
1 560 0.540000
1 566 0.830000
1 568 0.590000
1 623 0.450000
1 750 0.970000
1 838 0.700000
1 987 0.900000
1 1157 0.850000
1 2178 0.940000
1 2180 0.970000
1 2253 0.900000
1 2269 0.920000
1 2271 0.750000
1 2281 0.690000
1 2778 0.950000
1 2814 0.990000
2 26 0.800000
2 28 0.550000
2 31 0.770000
2 40 0.690000
2 45 0.310000
2 55 0.950000
2 57 0.920000
2 66 0.890000
2 107 0.760000
2 565 0.430000
2 579 0.970000
2 753 0.820000
2 754 0.730000
2 766 0.620000
2 875 0.835000
2 1328 0.860000
2 2176 0.520000
2 2177 0.980000
2 2184 0.990000
2 2187 0.950000
2 2189 0.920000
2 2233 0.900000
2 2234 0.940000
2 2275 0.910000
2 2282 0.710000
3 33 0.590000
3 37 0.910000
3 43 0.390000
3 46 0.760000
3 47 0.710000
3 62 0.700000
3 68 0.950000
3 599 0.900000
3 749 0.800000
3 768 0.430000
3 770 0.790000
3 988 0.740000
3 1250 0.630000
3 2171 0.900000
3 2188 0.940000
3 2195 0.990000
3 2235 0.950000
3 2240 0.980000
3 2243 0.920000
3 2266 0.530000
3 2270 0.970000
3 2272 0.820000
3 2277 0.930000
3 2288 0.840000
3 2528 0.870000
So I want to make 3 groups of n values and aim for a specific average value x. (Exemple with n=30 and 0.75 < x < 0.85 would look like 3 groups of 30 values each where each group has 0.75 < AVG(ratio) < 0.85 and an id can only belong to 1 group.)
So average is almost same in each group, and close to x:
groupId avg(ratio)
------- ----------
1 0.805600
2 0.789000
3 0.797600
Here is a T-SQL procedural version that is somewhat like a draft, only draft order is optimized each round according to need.
The “competitive” nature of this seems to lead to slightly less than perfect ratios if all items are to be picked, but the up-side is that you basically have an O(N^2) algorithm since it’s essentially a min function in a loop (maybe that’s optimistic considering the
group byclauses). It’s also deterministic, and should be fairly straightforward to implement in another layer if necessary.