I have a very simple function which takes in a matching bitfield, a grid, and a square. It used to use a delegate but I did a lot of recoding and ended up with a bitfield & operation to avoid the delegate while still being able to perform matching within reason. Basically, the challenge is to find all contiguous elements within a grid which match the match bitfield, starting from a specific ‘leader’ square. Square is somewhat small (but not tiny) class. Any tips on how to push this to be even faster? Note that the grid itself is pretty small (500 elements in this test).
Edit: It’s worth noting that this function is called over 200,000 times per second. In truth, in the long run my goal will be to call it less often, but that’s really tough, considering that my end goal is to make the grouping system be handled with scripts rather than being hardcoded. That said, this function is always going to be called more than any other function.
Edit: To clarify, the function does not check if leader matches the bitfield, by design. The intention is that the leader is not required to match the bitfield (though in some cases it will).
Things tried unsuccessfully:
- Initializing the dictionary and stack with a capacity.
- Casting the int to an enum to avoid a cast.
- Moving the dictionary and stack outside the function and clearing them each time they are needed. This makes things slower!
Things tried successfully:
- Writing a hashcode function instead of using the default: Hashcodes are precomputed and are equal to
x + y * parent.Width. Thanks for the reminder, Jim Mischel. - mquander’s Technique: See
GetGroupMquanderbelow. -
Further Optimization: Once I switched to HashSets, I got rid of the
Containstest and replaced it with anAddtest. BothContainsandAddare forced to seek a key, so just checking if an add succeeds is more efficient than adding if aContainsfails check fails. That is,if (RetVal.Add(s)) curStack.Push(s);public static List<Square> GetGroup(int match, Model grid, Square leader) { Stack<Square> curStack = new Stack<Square>(); Dictionary<Square, bool> Retval = new Dictionary<Square, bool>(); curStack.Push(leader); while (curStack.Count != 0) { Square curItem = curStack.Pop(); if (Retval.ContainsKey(curItem)) continue; Retval.Add(curItem, true); foreach (Square s in curItem.Neighbors) { if (0 != ((int)(s.RoomType) & match)) { curStack.Push(s); } } } return new List<Square>(Retval.Keys); }
=====
public static List<Square> GetGroupMquander(int match, Model grid, Square leader) { Stack<Square> curStack = new Stack<Square>(); Dictionary<Square, bool> Retval = new Dictionary<Square, bool>(); Retval.Add(leader, true); curStack.Push(leader); while (curStack.Count != 0) { Square curItem = curStack.Pop(); foreach (Square s in curItem.Neighbors) { if (0 != ((int)(s.RoomType) & match)) { if (!Retval.ContainsKey(s)) { curStack.Push(s); Retval.Add(curItem, true); } } } } return new List<Square>(Retval.Keys); }
The code you posted assumes that the
leadersquare matches the bitfield. Is that by design?I assume your
Squareclass has implemented aGetHashCodemethod that’s quick and provides a good distribution.You did say micro-optimization . . .
If you have a good idea how many items you’re expecting, you’ll save a little bit of time by pre-allocating the dictionary. That is, if you know you won’t have more than 100 items that match, you can write:
That will avoid having to grow the dictionary and re-hash everything. You can also do the same thing with your stack: pre-allocate it to some reasonable maximum size to avoid resizing later.
Since you say that the grid is pretty small it seems reasonable to just allocate the stack and the dictionary to the grid size, if that’s easy to determine. You’re only talking
grid_sizereferences each, so memory isn’t a concern unless your grid becomes very large.Adding a check to see if an item is in the dictionary before you do the push might speed it up a little. It depends on the relative speed of a dictionary lookup as opposed to the overhead of having a duplicate item in the stack. Might be worth it to give this a try, although I’d be surprised if it made a big difference.
I’m really stretching on this last one. You have that cast in your inner loop. I know that the C# compiler sometimes generates a surprising amount of code for a seemingly simple cast, and I don’t know if that gets optimized away by the JIT compiler. You could remove that cast from your inner loop by creating a local variable of the enum type and assigning it the value of
match:Then your inner loop comparison becomes:
No cast, which might shave some cycles.
Edit: Micro-optimization aside, you’ll probably get better performance by modifying your algorithm slightly to avoid processing any item more than once. As it stands, items that do match can end up in the stack multiple times, and items that don’t match can be processed multiple times. Since you’re already using a dictionary to keep track of items that do match, you can keep track of the non-matching items by giving them a value of
false. Then at the end you simply create aListof those items that have atruevalue.