I am using jEdit, and I have a bunch of badly coded HTML files of which I want to grab the main contents of and not the surrounding HTML.
I need everything in between <div class="main-text"> and the next </div>.
There must be a REGEX way of doing this, jEdit allows me to replace and find with regular expressions.
I am not profficient with regex and it would take me a long time to work it out – can anyone help quick please?
Taking your question literally, you can replace:
with
\1(or$1depending on what your editor uses).However, The Pony He Comes to bite you, because what if your “main-text” element contains another
<div>? If you’re sure this will not happen, then you’re fine. Otherwise, you’re in truble. It may be easier to replace/.*<div class="main-text">/with the empty string, then manully look for the end and delete everything after.For that matter, this task may be easiest to do manually, so you don’t have to double-check after your code has run.