http://msdn.microsoft.com/en-us/library/1x308yk8.aspx
This allows me to do this:
var str = "string ";
Char.IsWhiteSpace(str, 6);
Rather than:
Char.IsWhiteSpace(str[6]);
Seems unusual, so I looked at the reflection:
[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
public static bool IsWhiteSpace(char c)
{
if (char.IsLatin1(c))
{
return char.IsWhiteSpaceLatin1(c);
}
return CharUnicodeInfo.IsWhiteSpace(c);
}
[SecuritySafeCritical]
public static bool IsWhiteSpace(string s, int index)
{
if (s == null)
{
throw new ArgumentNullException("s");
}
if (index >= s.Length)
{
throw new ArgumentOutOfRangeException("index");
}
if (char.IsLatin1(s[index]))
{
return char.IsWhiteSpaceLatin1(s[index]);
}
return CharUnicodeInfo.IsWhiteSpace(s, index);
}
Three things struck me:
- Why does it bother to do the limit check only on the upper bound? Throwing an
ArgumentOutOfRangeException, while index below 0 would give string’s standardIndexOutOfRangeException - The precense of
SecuritySafeCriticalAttributewhich I’ve read the general blerb about, but still unclear what it is doing here and if it is linked to the upper bound check. TargetedPatchingOptOutAttributeis not present on otherIs...(char)methods. ExampleIsLetter,IsNumberetc.
Because not every character fits in a
C#char. For instance,""takes 2 C#chars, and you couldn’t get any information about that character with just acharoverload. WithStringand an index, the methods can see if the character at indexiis a High Surrogatechar, and then read the Low Surrogatecharat next index, add them up according to the algorithm, and retrieve info about the code pointU+20000.This is how UTF-16 can encode 1 million different code points, it’s a variable-width encoding. It takes 2-4 bytes to encode a character, or 1-2 C# chars.