Need a solution to kill nodes like <footer>foobar</footer> and <div class=nav></div> from many several

Question

0

Asked: May 14, 20262026-05-14T18:22:41+00:00 2026-05-14T18:22:41+00:00

Need a solution to kill nodes like <footer>foobar</footer> and <div class=nav></div> from many several

0

Need a solution to kill nodes like <footer>foobar</footer> and <div class="nav"></div> from many several HTML files.

I want to dump a site to disk without the menus and footers and what not. Ideally I would accomplish this task using basic unix tools like sed. Since it’s not XML I can’t use xmlstarlet.

Could anyone please suggest recipes, so I can ideally have a script running kill-node.sh 'div class="toplinks"' *.html to prune the bits I don’t want. Thank you,

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T18:22:41+00:00

Editorial Team

2026-05-14T18:22:41+00:00Added an answer on May 14, 2026 at 6:22 pm

Just to drive you regex haters nuts, try this on for size:

sed ':a;$!N;$!ba;s/B/-B/g;s/A/BB/g;s/<\/foo>/A/g;:b;s/<foo>[^A]*A//;tb;s/BB/A/g;s/-B/B/g' foo.html

With foo.html being:

<header>
keep me
<foo>gtg</foo>
</header>
<foo>
delete me</foo>
<foo>gtg</foo>
<foo>gtg</foo>

Otherwise can someone do a cmdline HTML5 parser please. Thanks. x

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Need a solution to kill nodes like <footer>foobar</footer> and <div class=nav></div> from many several

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply