I would like to remove all html tags but leave E.G. <a href=http://www.domain.com/>Link Title</a>

Question

0

Editorial Team

Asked: June 17, 20262026-06-17T10:12:33+00:00 2026-06-17T10:12:33+00:00

I would like to remove all html tags but leave E.G. <a href=http://www.domain.com/>Link Title</a>

0

I would like to remove all html tags but leave
E.G. <a href="http://www.domain.com/">Link Title</a>

So far this works for me except that it removes the </a> part.

sed -e 's/<[^">]*>//g'

I would like to know if there is a better way to do this.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T10:12:34+00:00

Basically what you’ve written removes any blocks of <Stuff> where Stuff doesn’t have any double quotes in it. If for example there were a perfectly valid bit of html like:

<a href='http://www.domain.com/'>Link Title</a>

or even some odd html like:

<a href=http://www.domain.com/>Link Title</a>

it wouldn’t work for you.

Regular expressions are considered a notoriously bad way to process HTML except in cases where you know exactly the full range of variations you can possibly process.

So read this viewpoint first.

I could suggest something like:

sed -e 's/<[^a>/!][^ >][^>]*>//g;s/<\/[^a>][^>]*>//g'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I would like to remove all html tags but leave E.G. <a href=http://www.domain.com/>Link Title</a>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply