There is a 2d array of items (in my case they are called Intersections).
A certain item is given as a start. The task is to find all items directly or indirectly connected to this item that satisfy a certain function.
So the basic algorithm is like this:
Add the start to the result list.
Repeat until no modification: Add each item in the array that satisfies the function and touches any item in the result list to the result list.
My current implementation looks like this:
private IList<Intersection> SelectGroup (
Intersection start,
Func<Intersection, Intersection, bool> select)
{
List<Intersection> result = new List<Intersection> ();
Queue<Intersection> source = new Queue<Intersection> ();
source.Enqueue (start);
while (source.Any ()) {
var s = source.Dequeue ();
result.Add (s);
foreach (var neighbour in Neighbours (s)) {
if (select (start, neighbour)
&& !result.Contains (neighbour)
&& !source.Contains (neighbour)) {
source.Enqueue (neighbour);
}
}
}
Debug.Assert (result.Distinct ().Count () == result.Count ());
Debug.Assert (result.All (x => select (x, result.First ())));
return result;
}
private List<Intersection> Neighbours (IIntersection intersection)
{
int x = intersection.X;
int y = intersection.Y;
List<Intersection> list = new List<Intersection> ();
if (x > 1) {
list.Add (GetIntersection (x - 1, y));
}
if (y > 1) {
list.Add (GetIntersection (x, y - 1));
}
if (x < Size) {
list.Add (GetIntersection (x + 1, y));
}
if (y < Size) {
list.Add (GetIntersection (x, y + 1));
}
return list;
}
(The select function takes a start item and returns true iff the second item satisfies.)
This does its job and turned out to be reasonable fast for the usual array sizes (about 20*20). However, I’m interested in further improvements. Any ideas?
Example (X satisfies in relation to other Xs, . does never satisfy):
.... XX.. .XX. X...
In this case, there are 2 groups: a central group of 4 items and a group of a single item in the lower left. Selecting the group (for instance by starting item [3, 3]) returns the former, while the latter can be selected using the starting item and sole return value [1, 4].
Example 2:
.A.. ..BB A.AA
This time there are 4 groups. The 3 A groups are not connected, so they are returned as separate groups. The bigger A and B groups are connected, but A does not related to B so they are returned as separate groups.
Step 1: trivial change for huge benefit
Simple, immediate improvement: Both your
result.Containsandsource.Containsmemberships are on list types, so they’ll be O(n) in the size of those lists, not very efficient. Since you really don’t care about any particular ordering, I’d change both of those intoHashSetfor constant-time lookups.Note that your current implementation would be O(n^2) in the worst case, which happens when the entire array is valid (by the time you’re inserting the last few elements, you’d be checking each one against the entire rest of the grid).
Step 2: further optimization
Better structural change: Keep a boolean
visitedarray of the same size as your Intersection array, and every time you look at anIntersection, mark it as visited. This way, you don’t have to check if something’s inresultorsourceevery time, and better yet, you don’t have to re-evaluate yourselectpredicate. Otherwise, given something like this:you’d be evaluating
selecton the center dot four times, which could be bad if it’s expensive. This way, yourcondition turns into:
if (!visited(neighbour) && select(start, neighbour), which will only evaluateselectat most once on any given intersection.Also, if you do this, you don’t even need to make
resultandcontainshashes any more, since you won’t be doing containment-checks on them.