I have a UTF-8 formatted data file that contains thousands of floating point numbers. At the time it was designed the developers decided to omit the ‘e’ in the exponential notation to save space. Therefore the data looks like:
1.85783+16 0.000000+0 1.900000+6-3.855418-4 1.958263+6 7.836995-4
-2.000000+6 9.903130-4 2.100000+6 1.417469-3 2.159110+6 1.655700-3
2.200000+6 1.813662-3-2.250000+6-1.998687-3 2.300000+6 2.174219-3
2.309746+6 2.207278-3 2.400000+6 2.494469-3 2.400127+6 2.494848-3
-2.500000+6 2.769739-3 2.503362+6 2.778185-3 2.600000+6 3.020353-3
2.700000+6 3.268572-3 2.750000+6 3.391230-3 2.800000+6 3.512625-3
2.900000+6 3.750746-3 2.952457+6 3.872690-3 3.000000+6 3.981166-3
3.202512+6 4.437824-3 3.250000+6 4.542310-3 3.402356+6 4.861319-3
The problem is float.Parse() will not work with this format. The intermediate solution I had was,
protected static float ParseFloatingPoint(string data)
{
int signPos;
char replaceChar = '+';
// Skip over first character so that a leading + is not caught
signPos = data.IndexOf(replaceChar, 1);
// Didn't find a '+', so lets see if there's a '-'
if (signPos == -1)
{
replaceChar = '-';
signPos = data.IndexOf('-', 1);
}
// Found either a '+' or '-'
if (signPos != -1)
{
// Create a new char array with an extra space to accomodate the 'e'
char[] newData = new char[EntryWidth + 1];
// Copy from string up to the sign
for (int i = 0; i < signPos; i++)
{
newData[i] = data[i];
}
// Replace the sign with an 'e + sign'
newData[signPos] = 'e';
newData[signPos + 1] = replaceChar;
// Copy the rest of the string
for (int i = signPos + 2; i < EntryWidth + 1; i++)
{
newData[i] = data[i - 1];
}
return float.Parse(new string(newData), NumberStyles.Float, CultureInfo.InvariantCulture);
}
else
{
return float.Parse(data, NumberStyles.Float, CultureInfo.InvariantCulture);
}
}
I can’t call a simple String.Replace() because it will replace any leading negative signs. I could use substrings but then I’m making LOTS of extra strings and I’m concerned about the performance.
Does anyone have a more elegant solution to this?
The ideas I’m using here ensure the decimal comes before the sign (thus avoiding any problems if the exponent is missing) as well as using LastIndexOf() to work from the back (ensuring we have the exponent if one existed). If there is a possibility of a prefix “+” the first if would need to include
|| signPos < decimalPos.Other results:
According to the comments a test of this method shows only a 5% performance hit (after avoiding the String.Format(), which I should have remembered was awful). I think the code is much clearer: only one decision to make.