I have a list of items which I would like to partition into subsets. For the sake of discussion lets say they’re files. I would like each subset to contain at most 5 files, and for the total size of the files in the subset to be less than 1 MB if possible. If a single file exceeds 1MB it should be in a subset by itself.
I wrote this up in a slightly more generic form, using a generic “item metric” instead of file size. But I suspect there’s a simpler and/or better way to do this. Any suggestions?
Here’s what I’ve got:
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int maxItemsPerSet, int maxMetricPerSet, Func<T, int> getMetric)
{
int currentMetricSum = 0;
List<T> currentSet = new List<T>();
foreach (T listItem in source)
{
int itemMetric = getMetric(listItem);
if (currentSet.Count > 0 &&
(currentSet.Count >= maxItemsPerSet || (currentMetricSum + itemMetric) > maxMetricPerSet))
{
yield return currentSet;
//Start a new subset
currentSet = new List<T>();
currentMetricSum = 0;
}
currentSet.Add(listItem);
currentMetricSum += itemMetric;
}
//Return the last set
yield return currentSet;
}
Bin packing is a NP-hard problem. The only way to get an optimal solution is to test all combinations. If there is a fixed number of distinct sizes, it can be systematically done using dynamic programming (there is an answer on SO with sample code for this case), but the running time for such algorithm is terrible.
This means you should be looking for a heuristic which would get you close to the optimal solution in reasonable time. Your algorithm (first-fit) is a good starting point. With little effort, it can be slightly improved by presorting the items by decreasing size. There are, however, several other more-or-less complex heuristics which improve both speed and the results.
A Google search returned this as one of the results: Basic analysis of bin-packing heuristics (there is a paper which analyses results). Apparently, a best fit algorithm with a bin lookup table provides good results with a reasonable running time.