I am looking to write some pseudo-code of a recursive descent parser. Now, I have no experience with this type of coding. I have read some examples online but they only work on grammar that uses mathematical expressions. Here is the grammar I am basing the parser on.
S -> if E then S | if E then S else S | begin S L | print E
L -> end | ; S L
E -> i
I must write methods S(), L() and E() and return some error messages, but the tutorials I have found online have not helped a lot. Can anyone point me in the right direction and give me some examples?.
I would like to write it in C# or Java syntax, since it is easier for me to relate.
Update
public void S() {
if (currentToken == "if") {
getNextToken();
E();
if (currentToken == "then") {
getNextToken();
S();
if (currentToken == "else") {
getNextToken();
S();
Return;
}
} else {
throw new IllegalTokenException("Procedure S() expected a 'then' token " + "but received: " + currentToken);
} else if (currentToken == "begin") {
getNextToken();
S();
L();
return;
} else if (currentToken == "print") {
getNextToken();
E();
return;
} else {
throw new IllegalTokenException("Procedure S() expected an 'if' or 'then' or else or begin or print token " + "but received: " + currentToken);
}
}
}
public void L() {
if (currentToken == "end") {
getNextToken();
return;
} else if (currentToken == ";") {
getNextToken();
S();
L();
return;
} else {
throw new IllegalTokenException("Procedure L() expected an 'end' or ';' token " + "but received: " + currentToken);
}
}
public void E() {
if (currentToken == "i") {
getNextToken();
return;
} else {
throw new IllegalTokenException("Procedure E() expected an 'i' token " + "but received: " + currentToken);
}
}
Basically in recursive descent parsing each non-terminal in the grammar is translated into a procedure, then inside each procedure you check to see if the current token you are looking at matches what you would expect to see on the right hand side of the non-terminal symbol corresponding to the procedure, if it does then continue applying the production, if it doesn’t then you have encountered an error and must take some action.
So in your case as you mentioned above you will have procedures:
S(),L(), andE(), I will give an example of howL()could be implemented and then you can try and doS()andE()on your own.It is also important to note that you will need some other program to tokenize the input for you, then you can just ask that program for the next token from your input.
Now you try
S()andE(), and post back.Also as Kristopher points out your grammar has something called a dangling else, meaing that you have a production that starts with the same thing up to a point:
So this begs the question if your parser sees an ‘if’ token, which production should it choose to process the input? The answer is it has no idea which one to choose because unlike humans the compiler can’t look ahead into the input stream to search for an ‘else’ token. This is a simple problem to fix by applying a rule known as Left-Factoring, very similar to how you would factor an algebra problem.
All you have to do is create a new non-terminal symbol S’ (S-prime) whose right hand side will hold the pieces of the productions that aren’t common, so your
Sproductions no becomes: