So I’m trying to use sed (it has to be sed on these systems,

Question

0

Asked: May 31, 20262026-05-31T12:47:25+00:00 2026-05-31T12:47:25+00:00

So I’m trying to use sed (it has to be sed on these systems,

0

So I’m trying to use sed (it has to be sed on these systems, so please don’t just recommend to use Perl) to match an HTML tag and get the contents out of it. The HTML tags look about like this:

<div class="SectionText"> Received poor service or think your current mechanic is ripping you off? Get some help from <a href="http://www.union.umd.edu/gradlegalaid/index.htm" target="_blank">Graduate Legal Aid</a> or consult the <a href="http://www.oag.state.md.us/Consumer/index.htm" target="_blank">Maryland Attorney General Office of Consumer Protection</a> at <a href="mailto:consumer@oag.state.md.us">consumer@oag.state.md.us</a> or through their hotline at 410-528-8662 or 888-743-0023.<br /></div>

All on one line. So, I wrote this one… But it doesn’t work.

sed 's/<div class=\"SectionText\">\([^<\/div>]*\)<\/div>/\1/g'

This does not alter any text.

I tried to use this website as a guideline – http://www.ibm.com/developerworks/linux/library/l-sed2/index.html (under RegExp Snafus)\

The most important thing is for this line script NOT to be greedy and match up until the last

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T12:47:27+00:00

[^<\/div>]*

This does not do what you think it does. This matches any sequence of characters that are not <, /, d, i, v or >.

In Perl you could simply use .*?, but as sed does not support non-greedy matches, you’ll have to write something like this beauty:

sed 's#<div class="SectionText">\(\([^<]\|<[^/]\|</[^d]\|</d[^i]\|</di[^v]\|</div[^>]\)*\)</div>#\1#g'

This says “any sequence of characters that are not <, or are < not followed by /, or are </ not followed by d, and so on.

Needless to say, this is an unreadable, unmaintainable and nearly unwritable piece of crap and you should almost certainly not be using it, but if you absolutely, positively must use regexes to parse HTML and absolutely, positively must use sed, then here you go.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So I’m trying to use sed (it has to be sed on these systems,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply