So I’m attempting to grab text from a file when it appears in double quotations, EXCEPT when the text within the quotations ends in a certain suffix.
For example below, I’d want everything NOT ending in VER that is in quotations to be parsed.
Example Input:
"GameVER": ["GM435615-IQR", "LG-QR435", "HG145-IR9", "WUT828-PQR10"] "VERIZON": ["GKSL42375834-45", "DG-67498", "GF4564", "HFJ-88.8.98"]
Output:
GM435615-IQR
LG-QR435
HG145-IR9WUT828-PQR10
VERIZON
GKSL42375834-45
DG-67498
GF4564
HFJ-88.8.98
In python, I’ve tried this:
re.findall(r'(\"\b.+?)(?!VER)\b\"',text)
But it still grabs the words with VER on the end.
Any help would be apperciated.
It’s because the
VERis being caught in the.+?(the?makes the.+non-greedy but in this case the only way for a....VERto be caught is by having it in the.+?). Instead of saying “match stuff not followed by ‘VER'”, try “match a word where the last 3 characters are not VER” (i.e. an end quote not preceded by ‘VER’).Also, instead of using
.+try[^"]which will avoid your.+matching across multiple words.example:
(by the way in your output above you missed out “HG145-IR9” and “WUT828-PQR10” although they do not end in VER and are in double quotes?)