I am looking for help to create a regular expression to replace all of the last spaces within a specific tag(eg. <p>) to be instead. To quickly fix all widows in a massive html document.
For example;
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus suscipit
dolor a felis blandit sodales. Donec lectus justo, convallis vitae euismod sit.
Nullam et tristique dui.</p>
<p>Nullam accumsan pellentesque pretium. Morbi tempor egestas lectus,
a eleifend enim aliquet varius. Vivamus vitae semper tortor.</p>
I found this example at http://www.petefreitag.com/item/580.cfm
ReReplace(text, " ([^ ]+\r?\n)", " \1", "ALL")
But it adds it to every last space in every element.
Thanks also any advice on how to improve how i wrote this question would be awesome
This problem is a little harder than it looks, since you may have
<p>elements with no words, with only one word, or multiple words, or plenty of whitespace before the ending tag</p>. You may even have nested elements within the paragraph element, making what you think might be orphan words not be orphans at all. To make things even more complicated, in many versions of HTML, the ending tag</p>is actually optional.For these reasons, it is recommended to use an HTML parser, and not just process your HTML file with a regex.
ONLY if you know that all
<p>elements are closed, and there are NO nested elements within the<p>elements, and ALL<p>elements have more than one word, you can get away with replacingwith
You can parenthesize the last
\s*and add a\3to the replacement string if you want to keep the spacing before the end tag if you like.I’d be careful before doing something like this without an HTML parser though.