I have chunks of strings within square brackets, like this:
[p1 text1/label1] [p2 text2/label2] [p3 text3/label3] [...
and so on.
What’s inside each chunk isn’t important. But sometimes there are stray chunks of text that are NOT surrounded by square brackets. For example:
[p1 text1/label1] [p2 text2/label2] textX/labelX [p3 text3/label3] [...] textY/labelY textZ/labelZ [...]
I thought I had this solved fine with regex in perl until I realized that I have only catered to the cases where there is a single stray text at the beginning, the middle, or the end of the text, but not where we might have two stray cases together. (like the Y and Z chunks above).
So I realized that regular expressions in perl only catch the first matching pattern? How could the above problem be solved then?
Edit:
The problem is to ensure that all should be surrounded by brackets. Square brackets are never recursive. When surrounding a phrase with brackets, the p-value depends on the “label” value. For eg, if a stray unbracketed phrase is
li/IN
then it should turn into:
[PP li/IN]
I guess it is a mix but the only way I can think of solving the bigger problem I’m working on is to turn all of them into bracketed phrases, so the handling is easier. So I’ve got it working if an unbracketed phrase happens at the beginning, middle and end, but not if two or more happen together.
I basically used a different regex for each position (beginning, middle and end). The one that catches an unbracketed phrase in the middle looks like this:
$data =~ s/\] (text)#\/label \[/\] \[selected-p-value $1#\/label\] \[/g;
So what I’m doing is just noticing that if a ] comes before and after the text/label pattern, then this one doesn’t have brackets. I do something similar for the others too. But I guess this is incredibly un-generic. My regex isn’t great!
Actually you can solve this using “only” regex :
Output :