I have a string that looks something like:
” ‘a ‘b ‘(d f g (1 2)) ‘(3 4) (a d) d “
And what I am trying to do is match so I get this output:
‘a, ‘b, ‘(d f g (1 2)), ‘(3 4), (a d), d
I am currently using:
"'\(.*\)|\(\.*\)|'\w+|\w+"
But there is a problem i’ve runned into using this,
for example if I write
‘(a b c) (d f)
it will return
‘(a b c) (d f)
instead of
‘(a b c), (d f)
So my question is if there is a way to solve this with regex or do I have to solve this an other way?
The answer is no.
The language you are trying to parse is not
regular, it’scontext-free. So you are not able to parse it with regex.If you’re interested, here is the grammar:
It’s not a regular since you can’t build FSM to represent it, which is true, in case you can recursively include bracket structures.
Well, whatever. Let’s answer the question “How?”. Traverse the string from the first character. Once you find a hyphen, start counting brackets. Opening counts for +1, closing counts for -1. Once you hit a closing bracket with zero resulting counter, insert a comma after that bracket. Problem solved:
etc.