Here is sample text(simplified from original):
<start1>
<name="4654">
bla bla bla bla
<tags="bla" model="c">
bla bla bla bla
<start2>
<name="12346">
bla bla bla bla
<tags="bla" model="d">
bla bla bla bla
<start3>
<name="73535">
bla bla bla bla
<tags="bla" model="c">
<start4>
<name="546875">
bla bla bla bla
<tags="bla" model="c">
bla bla bla bla
Here is my regex(dot matches new line option is on)
name="([\d]+)".+?(?<!start)tags="([^"]+?)" model="c"
As you can see there are 4 blocks, but I need to match those with model=”c”. However .+? is capturing more than it needs. Puting negative lookbehind to suppress it did not work… Any idea how can I exclude block?
Update(to clarify what I want to achieve):
out of sample data I want to match following 3 blocks:
First match
<name="4654">
bla bla bla bla
<tags="bla" model="c">
Second match
<name="73535">
bla bla bla bla
<tags="bla" model="c">
Third match
<name="546875">
bla bla bla bla
<tags="bla" model="c">
Is it always in this format of (start,name,tags), (start,name,tags), and so on? If so, you can even do without the lookaround.That works because you know the next
<you encounter will be for the immediately followingtagslabel. Can we guarantee that’s the case, or do we need to be more general to allow for other labels in the mix?Also, do you need to capture the text after
<tags>and before the next<start>? If so, you could add a little extra to the end for that.Okay, according to your comments, that’s not the case. Scratch that, then.
Update
Okay, how ’bout this then?
This actually uses a lookahead, not a lookbehind. A simple lookahead/lookbehind will only assert that a string occurs before or after a block of text, not within. By checking at every character with
((?!str).)+, you effectively ensure that “str” is not contained throughout the text.It might look confusing that I’m using a lookahead to check for
<start, whereas a lookbehind forstartwould look like(?<!start)instead of(?!<start).Think
(?!(<start))versus(?<!(start)).I added
(?: )just so it wouldn’t capture.