I would like to split the input as shown below. This input contains a data type that has two parts. First part is the title, [XX] or [YY], and the other part is content (contents may contain \r\n\r\n or [XX] or [YY]). When the content finished, new data starts with \r\n\r\n[XX] OR \r\n\r\n[YY]:
[XX]\r\n
bla bla\r\n
bla bla\r\n
\r\n
[YY]\r\n
bla [XX] bla\r\n
bla bla\r\n
\r\n
[YY]\r\n
bla [YY] bla\r\n\r\n
bla bla\r\n
\r\n
I wrote two regex for catch and split this input, two of them are working but i think, they can be improved.
First: \[(XX|YY)\]\r\n((?:(?!\r\n\r\n\[(XX|YY)\]).)*)
Second (works good except the last data): \[(XX|YY)\]\r\n(.*?)(?=\r\n\r\n\[(XX|YY)\])
Both of them have “.*”, so they use lots of backtrack, is there any what to do this with [^]?
Thanks…
I think both are fine and don’t have any more backtracking than necessary. You could anchor the title part to the start of the line, but that shouldn’t make much of a difference:
But perhaps your language already provides this functionality? It looks like you’re trying to parse a Windows-style configuration (
.ini) file. For example, Python has theConfigParsermodule for that.