I have written some small string parsing functions in F# – in order to get a better feel for F# and to see how to solve such tasks with it. I try to walk over a string and search for specific characters via recursion.
The logic does work but the generated IL code of the release build (optimizations turned on) does look kind of weird in my opinion. So I guess there is a better way to write this stuff in a performant way in F#.
This is what part the parsing functions look like:
let eatTag (input : string) index =
let len = input.Length
let nothing = 0, null, TagType.Open
// more functions used in the same way
// ...
let rec findName i =
if i >= len then nothing
else
let chr = input.[i]
if isWhitespace chr then
findName (i+1)
elif chr = '/' then
getName (i+1) (i+1) true
else getName (i+1) i false
let rec findStart i =
if i >= len then nothing
elif input.[i] = '<' then findName (i+1)
else findStart (i+1)
findStart index
This is what the generated IL code for the findStart function looks like:
// loop start
IL_0000: nop
IL_0001: ldarg.2
IL_0002: ldarg.1
IL_0003: blt.s IL_000e
IL_0005: ldc.i4.0
IL_0006: ldnull
IL_0007: ldc.i4.0
IL_0008: newobj instance void class [mscorlib]System.Tuple`3<int32, string, valuetype TagType>::.ctor(!0, !1, !2)
IL_000d: ret
IL_000e: ldarg.0
IL_000f: ldarg.2
IL_0010: call instance char [mscorlib]System.String::get_Chars(int32)
IL_0015: ldc.i4.s 60
IL_0017: bne.un.s IL_0024
IL_0019: ldarg.0
IL_001a: ldarg.1
IL_001b: ldarg.2
IL_001c: ldc.i4.1
IL_001d: add
IL_001e: call class [mscorlib]System.Tuple`3<int32, string, valuetype TagType> findName@70(string, int32, int32)
IL_0023: ret
IL_0024: ldarg.0
IL_0025: ldarg.1
IL_0026: ldarg.2
IL_0027: ldc.i4.1
IL_0028: add
IL_0029: starg.s i
IL_002b: starg.s len
IL_002d: starg.s input
IL_002f: br.s IL_0000
// end loop
The C# view (ILSpy) for this function shows the following code – and this is especially the reason why I think I am doing something wrong here. Obviously the function arguments are somehow assigned to itself…?!
internal static Tuple<int, string, TagType> findStart@80(string input, int len, int i)
{
while (i < len)
{
if (input[i] == '<')
{
return findName@70(input, len, i + 1);
}
string arg_2D_0 = input;
int arg_2B_0 = len;
i++;
len = arg_2B_0;
input = arg_2D_0;
}
return new Tuple<int, string, TagType>(0, null, TagType.Open);
}
The same problem can be seen in the other functions that are processed in a continuation-style. Any pointers to what I am either doing or assuming wrong are greatly appreciated 🙂
This is tail call elimination.
The process of removing a tail call and turning the tail call in to a ‘jump’ to the start of the function. (iow a
while(true) { }construct).The reason you see the ‘same’ assignments is to keep the semantics the same as if you were calling the function normally. It is nigh impossible to determine if 1 assignment could affect another efficiently, hence the use of temporary variables, and then the assignment back to them.