As explained in Can regular expressions be used to match nested patterns?, it is not possible to create regex to match arbitrary nested pattern. But is it possible to create an algorithm that would generate a regex of n-th level of “nesteness”?
basically, i want to replace trim(whatever) with rtrim(ltrim(whatever))
i managed to create 3 levels by hand (javascript syntax):
level[1] = /\(([^()]*)\)/g
level[2] = /\(((?:[^()]*\([^()]*\))*[^()]*)\)/g
level[3] = /\(((?:(?:(?:[^()]*\([^()]*\))*[^()]*)*\((?:(?:[^()]*\([^()]*\))*[^()]*)*\))*[^()]*)\)/g
here are some test data:
1st(ddd) + 1st(ddd)
2nd(dd(d))
3rd(a(b) + (cd(h) + d(dfas) + zzz))
4th(a(b(c(d))))
8th(a(b(c(d(e(f(g()))))))
i know that at every level [^()]* needs to be replaced with noncapturing group that can contain parentheses, but i’m not sure how to generalize the algoritm for n-th level…
You can think about it more theoretically: a match for parenthesis nested
ndeep is just parenthesis around matches forn-1or less deep (with at least one exactlyn-1deep).We can give a recursive definition of the regexes. Let
X[n]be the regex for nesting exactlynlevels, andY[n]be the regex for a string containing brackets with any level of nesting up tonlevels, so:with
Y[0] = X[0] = [^()]*(no nesting) andX[1] = \([^()]*\). (I’m not bothering with the details of non-capturing groups etc yet, and the spaces are just for readability.)Writing an algorithm based on this should be quite easy.
The regexes from these new (less mutually recursive) definitions get longer much much more slowly (they are polynomial rather than exponential).
Let
l[n]be the length ofX[n], andL[n]be the length ofY[n], then (the constant terms are just the hardcoded characters in each one):with the appropriate initial conditions for
l[0]andl[1]. Recurrence relations of this form have quadratic solutions, so this is onlyO(n^2). Much better.(For others, I had a previous definition of
Y[n]wasY[n] = Y[n-1] | X[n]; this extra recursion meant that the length of theXregex wasO(2.41^n), which sucks a lot.)(The new definition of
Yis a tantalising hint that there might even be a way of writingXthat is linear inn. I don’t know though, and I have a feeling the extra restriction onXof exact length means it is impossible.)The following is some Python code that computes the regexes above, you can probably translate it to javascript without too much trouble.