I am writing a high performance parser, and it seems to me that Int32.Parse can be too slow. I wrote a simple version that assumes correct input, and it’s performing much better. So should I create my own version instead? Or is there another faster method already available?
My method is like this:
// parse simple int, assuming relatively correct input (i.e. all digits) public static int ParseInt32Simply(string str) { if (str == null) throw new ArgumentNullException('str'); if (str.Length == 0) throw new ArgumentException('str is empty'); int sign = 1, index = 0; if (str[0] == '-') { sign = -1; index = 1; } else if (str[0] == '+') { index = 1; } int result = 0; for (; index < str.Length; ++index) { result = 10 * result + (str[index] - '0'); } if (result < 0) throw new OverflowException(str + ' is too large for Int32'); return result * sign; }
My results are very different from the builtin equivalent:
Int32.Parse took 8.2775453 seconds ParseInt32Simply took 0.6511523 seconds Int32.Parse took 6.7625807 seconds ParseInt32Simply took 0.4677390 seconds
(Running 25 million iterations on my machine; a P4 3 GHz, running VS 2008 SP1)
So, should I use my version? Or is there another method available that I can use?
If your parsing a format of which you know to be valid numbers, you can indeed write a faster custom parser. I’ve written a Double.Parse function for the same purpose once. And it faster to begin with the least significant digit. That way you can just increment the power of the digit your parsing.
I’ve created a quick implementation of this,
If you really want speed, you can write a unsafe implementation..
If your parsing a big file, you could read the files as raw bytes and work with those. That will make it a lot faster (no converting to unicode string, no splitting the strings in lines, no splitting the lines in substrings, no parsing the substrings), but you’re going to lose maintainability.