I’m trying to write a parser with treetop to parse some latex commands into HTML markup. With the following I get a deadspin in generated code. I’ve build the source code with tt and stepped through but it doesn’t really elucidate what the underlying issue is (it just spins in _nt_paragraph)
Test input: "\emph{hey} and some more text."
grammar Latex
rule document
(paragraph)* {
def content
[:document, elements.map { |e| e.content }]
end
}
end
# Example: There aren't the \emph{droids you're looking for} \n\n.
rule paragraph
( text / tag )* eop {
def content
[:paragraph, elements.map { |e| e.content } ]
end
}
end
rule text
( !( tag_start / eop) . )* {
def content
[:text, text_value ]
end
}
end
# Example: \tag{inner_text}
rule tag
"\\emph{" inner_text '}' {
def content
[:tag, inner_text.content]
end
}
end
# Example: \emph{inner_text}
rule inner_text
( !'}' . )* {
def content
[:inner_text, text_value]
end
}
end
# End of paragraph.
rule eop
newline 2.. {
def content
[:newline, text_value]
end
}
end
rule newline
"\n"
end
# You know, what starts a tag
rule tag_start
"\\"
end
end
For anyone curious, Clifford over at the treetop dev google group figured this out.
The problem was with paragraph and text.
Text is 0 or more characters, and there can be 0 or more texts in a paragraph, so what was happening was there was an infinite amount of 0 length characters before the first \n, causing the parser to dead spin. The fix was to adjust text to be:
So that it must have at least one character to match.