Note: I’m optimizing because of past experience and due to profiler software’s advice. I realize an alternative optimization would be to call GetNeighbors less often, but that is a secondary issue at the moment.
I have a very simple function described below. In general, I call it within a foreach loop. I call that function a lot (about 100,000 times per second). A while back, I coded a variation of this program in Java and was so disgusted by the speed that I ended up replacing several of the for loops which used it with 4 if statements. Loop unrolling seems ugly, but it did make a noticeable difference in application speed. So, I’ve come up with a few potential optimizations and thought I would ask for opinions on their merit and for suggestions:
- Use four if statements and totally ignore the DRY principle. I am confident this will improve performance based on past experience, but it makes me sad. To clarify, the 4 if statements would be pasted anywhere I called getNeighbors() too frequently and would then have the inside of the foreach block pasted within them.
- Memoize the results in some mysterious manner.
- Add a ‘neighbors’ property to all squares. Generate its contents at initialization.
-
Use a code generation utility to turn calls to GetNeighbors into if statements as part of compilation.
public static IEnumerable<Square> GetNeighbors(Model m, Square s) { int x = s.X; int y = s.Y; if (x > 0) yield return m[x - 1, y]; if (y > 0) yield return m[x, y - 1]; if (x < m.Width - 1) yield return m[x + 1, y]; if (y < m.Height - 1) yield return m[x, y + 1]; yield break; } //The property of Model used to get elements. private Square[,] grid; //... public Square this[int x, int y] { get { return grid[x, y]; } }
Note: 20% of the time spent by the GetNeighbors function is spent on the call to m.get_Item, the other 80% is spent in the method itself.
Brian,
I’ve run into similar things in my code.
The two things I’ve found with C# that helped me the most:
First, don’t be afraid necessarily of allocations. C# memory allocations are very, very fast, so allocating an array on the fly can often be faster than making an enumerator. However, whether this will help depends a lot on how you’re using the results. The only pitfall I see is that, if you return a fixed size array (4), you’re going to have to check for edge cases in the routine that’s using your results.
Depending on how large your matrix of Squares is in your model, you may be better off doing 1 check up front to see if you’re on the edge, and if not, precomputing the full array and returning it. If you’re on an edge, you can handle those special cases separately (make a 1 or 2 element array as appropriate). This would put one larger statement in there, but that is often faster in my experience. If the model is large, I would avoid precomputing all of the neighbors. The overhead in the Squares may outweigh the benefits.
In my experience, as well, preallocating and returning vs. using yield makes the JIT more likely to inline your function, which can make a big difference in speed. If you can take advantage of the IEnumerable results and you are not always using every returned element, that is better, but otherwise, precomputing may be faster.
The other thing to consider – I don’t know what information is saved in Square in your case, but if hte object is relatively small, and being used in a large matrix and iterated over many, many times, consider making it a struct. I had a routine similar to this (called hundreds of thousands or millions of times in a loop), and changing the class to a struct, in my case, sped up the routine by over 40%. This is assuming you’re using .net 3.5sp1, though, as the JIT does many more optimizations on structs in the latest release.
There are other potential pitfalls to switching to struct vs. class, of course, but it can have huge performance impacts.