Having a hard time explaining what I mean, so here is what I want to do
I want any sentence to be parsed along the pattern of
text #something a few words [someothertext]
for this, the matching sentence would be
Jeremy is trying #20 times to [understand this]
And I would name 4 groups, as text, time, who, subtitle
However, I could also write
#20 Jeremy is trying [understand this] times to
and still get the tokens
#20
Jeremy is trying
times to
understand this
corresponding to the right groups
As long as the delimited tokens can separate the 2 text only tokens, I’m fine.
Is this even possible? I’ve tried a few regex’s and failed miserably (am still experimenting but finding myself spending way too much time learning it)
Note: The order of the tokens can be random. If this isn’t possible with regex then I guess I can live with a fixed order.
edit: fixed a typo. clarified further what I wanted.
You can alternate on the different types of text. Using named groups means that one group would have a
Successvalue equal to true for each match.This pattern should do what you need:
(?<Number>#\d+\b)– matches#followed by one or more digits, up to a word boundary(?<Subtitle>\[.+?])– non-greedy matching of text between square brackets\s*(?<Text>(?:.(?!#\d+\b|\[.*?]))+)\s*– trims spaces at either end of the string, and the named capture group uses an approach that matches a single character at a time provided that the negative look-ahead fails to match if it detects text that would match the other 2 text patterns of interest (numbers and subtitles).Example usage:
Alternately, this example demonstrates how to pair the named groups to the matches: