I’m using Lepl as a parser and the language I’m parsing is very complicated and I only care about a small subset. I can’t figure out a way to have Lepl parse the grammar I care about and just return strings for everything else. If I add a rule like:
everything_else = ~newline & Regexp('.')[:]
Then it gets used instead of the things I care about. I think it is happening because it is a longer match than my other rules. Is there a configuration setting or something in Lepl so that I can have an imperfect parser?
Update
As requested adding some details. I only want to parse out the top level variable definitions that equal a number. The ones that are dependent on others or are a math expression I want to ignore. I also want to ignore what is inside the block definitions There are many other constructs in the language that I want to ignore. So here’s an example:
from lepl import *
class Variable(List): pass
import string
def parse_it(a_string):
# Parser: TODO: incomplete
s = ~Space()[:] # zero or more spaces
s1 = ~Space()[1:] # 1 or more spaces
newline = Newline() & s
number_squote = ~Optional(Literal("'")) & s & Real() & s & ~Optional(Literal("'"))
number_dquote = ~Optional(Literal('"')) & s & Real() & s & ~Optional(Literal('"'))
number = number_squote | number_dquote | Real() >> float
var_keyword = ~newline & ~Regexp(r'(?i)variable')
var_name = Word() >> string.lower
var_assignment = s1 & var_name & s & ~Literal('=') & s & number > Variable
vars = var_keyword & var_assignment[1:]
parser = vars[1:]
return parser.parse(a_string)
input="""
VARIABLE abc=5 bbb='7' ddd='abc*bbb'
variable ccccc=7 // comment
block(1,2,3,4) of_type=cleaner abc=4 d=5 c=string('hi')
define_block block2 (3,4,5,6,7,a,b) var1=35 var2=36
variable ignore_this=5
block3(3,4,5,6) x='var1*ignore_this' y=var2
block4(4,5,6,7,a,b) x='var1*2' y="var2*3"
end_block
block2(1,2,3,4,5,6,3) abc=ccccc d=abc
create_blocks // comment: initialize memory
connect_blocks // connect blocks together
simulate //
"""
for i in parse_it(input):
print i
So I only really care about the variable Word() = Real() information in the file defined outside the block definitions. I want to keep the rest as strings so that I can build a AST and modify the variable values and then write out the control file again.
so, if i understand correctly, you want to parse any line that starts with “variable” (ignoring case) and that is not inside a block.
the first thing we need to worry about is how much we need to understand about the bits we want to skip. for example, we could skip everything between
define_blockandend_block, but what if the text “end_block” happens to appear in some string? maybe to handle that case we also need to be aware of strings? or comments? these kind of worries are why often it is not as easy as you might think to simply skip text – it turns out that to understand what we can skip we actually do need to parse the data.but perhaps in this case we are ok. it looks like you have neither multi-line strings not multi-line comments, and that
define_blockandend_blockalways occur at the start of a line. that gives us enough guarantees (i think) to be able to drop blocks without worrying about strings or comments (because a string or comment would start with//or"or similar, and so a misleading//define_blockor"define_block"would not be at the start of the line).we can do that outside of lepl:
or as a regexp inside:
so your final solution will be something line
hope that helps.
and finally, full disclosure, i should also point you to https://groups.google.com/group/lepl/browse_thread/thread/e305b5b559d93e9e which i posted today (sorry).