I have these lines, that I need to delete the lines that end with “/index.html” (starting from the “< a” tag (two lines before it), and leave the all the other lines as is.
Example:
<a href="http://site.com/dir/file.html">
/dir/file.html</a>:
../../../index.html<br>
<a href="http://site.com/dir/file2.html">
/dir/file2.html</a>:
../../../page.html<br>
<a href="http://site.com/dir/name.html">
/dir/name.html</a>:
../../../index.html<br>
<a href="http://site.com/dir/any-link_.html">
/dir/any-link_.html</a>:
../../../file-name.html<br>
Output:
<a href="http://site.com/dir/file2.html">
/dir/file2.html</a>:
../../../page.html<br>
<a href="http://site.com/dir/any-link_.html">
/dir/any-link_.html</a>:
../../../file-name.html<br>
So the regular expression should delete whatever comes before “/index.html” up until the < a (two lines before it), and leaves the other lines behind.
I was trying something like: ^./index.html in Notepad++, but it deletes only the lines that has “/index.html”, I don’t know how to remove starting from the < a that is before it with 2 lines.
Matches the
<a href="http:site.comliterally, followed by the path name, then the end of the tag and all whitespace (including new lines), until a repition of the file name (\1), followed by the close tag, a colon, more white space (again, including a newline), then any number of characters(Except a new line) folowed byindex.html<br>then all the whitespace before the next line (Including, again, the newline)Could probably be shortened to
But beware of .* and it’s unintended side effects. Regular Expressions should always be as specific as possible, especially when using them to delete.