I’ve a file containing hundreds of SQL Insert statements. I want to identify only those statements start with HTML paragraph tag <p> but doesn’t have an ending para tag </p>.
I’m trying on these lines
<p>[^\n]*(?!</p>) <-- a <p> followed by any number of characters until \n and then </p>
This does not work. Below is the sample data
INSERT INTO `help` VALUES
(1,1,'<p>Radiotherapy uses a beam of high-energy rays (or particles) lymph nodes.</p>'),
(2,1,'<p>EBRT delivers radiation from a machine outside the body. '),
(3,1,'<p>Following lumpectomy radiotherapy <ul><li>Heading</li></ul></p>'),
Ideally, I would be appending a </p> where they are not present e.g. in the insert statement #2.
If you use this:
(\(\d+,\d+,'<p>.*?)(</p>)?('\),)You’ll get back-references to the following parts:
(1,1,'<p>Radiotherapy uses a beam of high-energy rays (or particles) lymph nodes.<– ie the preamble, and body text including the opening P tag</p><– The optional closing P tag.. ie you might not get a match for 2.'),<– the closing quote and parenthesis, and trailing commaYou can then replace this with:
$1</p>$3(eg using .NET style backreferences).ie, rebuild the string with each of your backreferences, with an explicit closing P tag regardless of if one was found or not.
Without knowing your platform, I can’t give you the correct regex replace syntax for this.
In .NET it would be:
Which produces this output: