I’m trying to create an algorithm to solve the following problem:

Input is an unsorted list of sets containing pairs (key, value) of ints. The first of each pair is positive and unique within the set.
I want to find an algorithm to split the input sets so the sets can be ordered such that for each key the value is nondecreasing in the set order.
There is a trival solution which is to split the sets into each individual value and sort them, I’d like something more efficient in terms of the number of sets which are split.
Are there any similar problems you have encountered and/or techniques you can suggest?
Does the optimal (minimum number of splits) solution sound like it is possible in polynomial time?
Edit: In the example the “<=” operator indicates a constraint on the sets as a whole whereby for each key value (100, 101, 102) the corresponding values are equal to or greater than the values in previous sets (or omitted from the set). I.e extracting the values for each key using the order from the output sets gives:
- Key 100 {0, 1}
- Key 101 {2, 3}
- Key 102 {10, 15}
A*
I propose using A* to find an optimal solution. Build the order of split sets incrementally from left to right, minimizing the number of sets required to achieve this.
A* visits states based on some heuristic estimate of the total cost. I propose that a state is described by the totality of all the pairs already included in the order as we have it so far. If all values for every key are different, then you can represent this information rather concisely by simply storing the last value for each key. Otherwise you’ll have to somehow take care of equal values, so you know which ones were already included and which ones were not. For every state you maintain some representation of the best order leading to it, but that may get updated along the way while the state remains the same.
The heuristic should be an estimate of the total cost of the path from the beginning through the current state to the goal. It may be too low, but must never be too high. In our case, the heuristic should count the number of (possibly split) sets included in the order so far, and add to that the number of (unsplit) sets still waiting for insertion. As the remaining sets may need splitting, this might be too low, but as you can never have less sets than those still waiting for insertion, it is a suitable heuristic.
Now you have some priority queue of states, ordered by the value of this heuristic. You extract minimal items from it, and know that the moment you extract a state from the queue, the cost up to that state can not decrease any more, so the path up to that state is optimal. Now you examine what other states can be reached from this: which other pairs can be next in the order of split sets? For each remaining set which has pairs that are ready to be included, you create a new subsequent state, taking all the pairs from the set which are ready. The cost so far increases by one. If you manage to take a whole set, without splitting, then the extimate for the remaining cost decreases by one.
For this new state, you check whether it is already persent in your priority queue. If it is, and its previous cost was higher than the one just computed, then you update its cost, and the optimal path leading to it. Make sure the priority key changes its position accordingly (“decrease key”). If the state wasn’t present in the queue before, then add it to the queue.
Dijkstra
Come to think of it, this is the same as running Dijkstra’s algorithm with the number of splits as cost. And as each edge has either cost zero or cost one, you can implement this even easier, without any priority queue at all. Instead, you can use two sets, called
S₀andS₁, where all elements from S₀ require the same number of splits, and all elements from S₁ require one more split. Roughly sketched in pseudocode: