I’m trying to understand the Forward() element from pyparsing. Suppose I have this simple BNF:
identifier =
"a..z,$,_" < "a..z,$,_,0..9" >
package_name =
identifier
/ ( package_name "." identifier )
and I try to parse a simple package like java.lang.String I get either just java as result or never return from recursion.
I tried it like this:
from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal
identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")
package_name = Forward()
definition = package_name+dot+identifier
package_name << Group(identifier+ZeroOrMore(definition))
package_name.parseString("java.lang.String")
will print [[‘java’]]
from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal
identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")
package_name = Forward()
definition = identifier^package_name+dot+identifier
package_name << definition
package_name.parseString("java.lang.String")
will reach recursion limit
how does this Forward placeholder work?
The problem is not with
Forwardbut with your grammar, which is inherently either limited too early, or recursive in a way that is undecidable with a naive recursive descent parser like Pyparsing.You have this:
If you match left to right, this will always match a single identifier and then stop, without attempting to match a following period. If you switch the order to match the
identifierlast:. . . then it will infinitely recurse, because in order to decide if
package_namematches, the first thing it has to do is decide whetherpackage_namematches. This is a left-recursive grammar, which a simple recursive-descent parser like Pyparsing can’t handle. Pyparsing does not look ahead to see how a match will affect subsequent matches. It just tries the matches left to right.You can get a simple example of how
Forwardworks by changing the way your grammar recurses:Here, the recursion happens on the right, not on the left, so Pyparsing can match it incremenetally.
(Your use of ZeroOrMore is a red herring. If you’re going to have a recursive grammar like this, you don’t want to use ZeroOrMore, because the recursive definition already allows your sub-expression to match multiple times. As I suggested in my comment, though, it is much simpler to define this sort of grammar without recursion anyway.)