I’m doing some basic formatting for my Lua code. Like adding missing whitespace around math operands etc.
Now I realized it’s a really bad idea to do these formattings if I’m inbetween quotation marks (ie. I don’t want to alter strings in the code).
What would be the most elegent way around this problem. My ideas so far:
- Do a regexp as I’ve done so far to find a place I want to edit and then count quotation marks from the beginning of the string to this place. If there’s odd number of quotation marks, I’m inside a quoted string.
- Pre-split the line to parts that are inside and outside quotes and only format the outside parts.
Am I missing something obvious? Better alternatives?
What you’re describing is usually called a ‘pretty printer’.
Since Lua’s grammar is so simple, the most robust approach would be to implement a parser for Lua syntax that just emits the parsed code in a standard style. You could hack this into the existing parser (lparser.c), or use one of the existing grammars on the lua-users wiki. The grammar appears to be LL(1), so a simple recursive descent parser or LPEG would be good choices.
Trying to do this with only regular expressions almost always leads to more work than just using a real parser, as more and more special cases and contextual info (e.g. counting nested parentheses) creep into the regexes.