Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8441005
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T08:27:09+00:00 2026-06-10T08:27:09+00:00

Okay, this is an easy one, but I can’t figure it out. Basically I

  • 0

Okay, this is an easy one, but I can’t figure it out.

Basically I want to extract all links (<a href="[^<>]*">[^<>]*</a>) from a big html file.

I tried to do this with sed, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:

sed 's_<a href="[^<>]*">[^<>]*</a>_TEST_g'

If I run that on something like

<div><a href="http://wwww.google.com">A google link</a></div>
<div><a href="http://wwww.google.com">A google link</a></div>

I get

<div>TEST</div>
<div>TEST</div>

How can I get rid of everything else and just print the matches instead? My preferred end result would be:

<a href="http://wwww.google.com">A google link</a>
<a href="http://wwww.google.com">A google link</a>

PS. I know that my regexp is not the most flexible one, but it’s enough for my intentions.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T08:27:10+00:00Added an answer on June 10, 2026 at 8:27 am

    Match the whole line, put the interesting part in a group, replace by the content of the group. Use the -n option to suppress non-matching lines, and add the p modifier to print the result of the s command.

    sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
    

    Note that if there are multiple links on the line, this only prints the last link. You can improve on that, but it goes beyond simple sed usage. The simplest method is to use two steps: first insert a newline before any two links, then extract the links.

    sed -n -e 's!</a>!&\n!p' | sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
    

    This still doesn’t handle HTML comments, <pre>, links that are spread over several lines, etc. When parsing HTML, use an HTML parser.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Okay this is definitley an easy question and a stupid one but since I
Okay, this should be an easy one for the pros out there (disclaimer, I'm
Okay this is very hard to explain, but i want to add padding to
Okay so this query should be easy but I'm having a bit of difficult.
Okay this is definitely a n00b question but here goes. The way I understand
Okay this may be a simple question but I have yet to come with
Okay, now this is more a rant about Linux than a question, but maybe
Okay I just typed this whole question out and then managed to delete it.
This is probably a really easy question to answer, but for some reason I'm
Okay this is not a question of how to get all uniques or How

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.