I’m using Lepl as a parser and the language I’m parsing is very complicated

Question

0

Asked: June 4, 20262026-06-04T21:54:56+00:00 2026-06-04T21:54:56+00:00

I’m using Lepl as a parser and the language I’m parsing is very complicated

0

I’m using Lepl as a parser and the language I’m parsing is very complicated and I only care about a small subset. I can’t figure out a way to have Lepl parse the grammar I care about and just return strings for everything else. If I add a rule like:

everything_else = ~newline & Regexp('.')[:]

Then it gets used instead of the things I care about. I think it is happening because it is a longer match than my other rules. Is there a configuration setting or something in Lepl so that I can have an imperfect parser?

Update
As requested adding some details. I only want to parse out the top level variable definitions that equal a number. The ones that are dependent on others or are a math expression I want to ignore. I also want to ignore what is inside the block definitions There are many other constructs in the language that I want to ignore. So here’s an example:

from lepl import *

class Variable(List): pass
import string

def parse_it(a_string):

    # Parser:  TODO: incomplete
    s = ~Space()[:] # zero or more spaces
    s1 = ~Space()[1:]  # 1 or more spaces
    newline = Newline() & s
    number_squote = ~Optional(Literal("'")) & s & Real() & s & ~Optional(Literal("'"))
    number_dquote = ~Optional(Literal('"')) & s & Real() & s & ~Optional(Literal('"'))
    number = number_squote | number_dquote | Real() >> float
    var_keyword = ~newline & ~Regexp(r'(?i)variable')
    var_name = Word() >> string.lower
    var_assignment = s1 & var_name & s & ~Literal('=') & s & number > Variable
    vars = var_keyword & var_assignment[1:]
    parser = vars[1:]
    return parser.parse(a_string)

input="""
VARIABLE abc=5 bbb='7' ddd='abc*bbb'
variable ccccc=7  // comment
block(1,2,3,4) of_type=cleaner abc=4 d=5 c=string('hi')

define_block block2 (3,4,5,6,7,a,b) var1=35 var2=36
variable ignore_this=5
block3(3,4,5,6) x='var1*ignore_this' y=var2
block4(4,5,6,7,a,b) x='var1*2' y="var2*3"
end_block

block2(1,2,3,4,5,6,3) abc=ccccc d=abc 

create_blocks  // comment: initialize memory
connect_blocks // connect blocks together
simulate // 

"""
for i in parse_it(input):
    print i

So I only really care about the variable Word() = Real() information in the file defined outside the block definitions. I want to keep the rest as strings so that I can build a AST and modify the variable values and then write out the control file again.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T21:54:58+00:00

so, if i understand correctly, you want to parse any line that starts with “variable” (ignoring case) and that is not inside a block.

the first thing we need to worry about is how much we need to understand about the bits we want to skip. for example, we could skip everything between define_block and end_block, but what if the text “end_block” happens to appear in some string? maybe to handle that case we also need to be aware of strings? or comments? these kind of worries are why often it is not as easy as you might think to simply skip text – it turns out that to understand what we can skip we actually do need to parse the data.

but perhaps in this case we are ok. it looks like you have neither multi-line strings not multi-line comments, and that define_block and end_block always occur at the start of a line. that gives us enough guarantees (i think) to be able to drop blocks without worrying about strings or comments (because a string or comment would start with // or " or similar, and so a misleading //define_block or "define_block" would not be at the start of the line).

we can do that outside of lepl:

block = re.compile(r'^\s*define_block.*?^\s*end_block[^$]*', re.I | re.M | re.S)
input = block.sub('', input)
for line in input.split('\n'):
    if line.lower().startswith('variable'):
        print line

or as a regexp inside:

block = Regexp(r'(?ims)^\s*define_block.*?^\s*end_block[^$]*')

so your final solution will be something line

variable = ...
other_line = Regexp(r'^.*$')
parser = (variable | block | other_line)[:]

hope that helps.

and finally, full disclosure, i should also point you to https://groups.google.com/group/lepl/browse_thread/thread/e305b5b559d93e9e which i posted today (sorry).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using Lepl as a parser and the language I’m parsing is very complicated

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply