Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 865653
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T09:40:15+00:00 2026-05-15T09:40:15+00:00

From past few days I’m trying to develop a regex that fetch all the

  • 0

From past few days I’m trying to develop a regex that fetch all the external links from the web pages given to it using grep.

Here is my grep command

grep -h -o -e "\(\(mailto:\|\(\(ht\|f\)tp\(s\?\)\)\)\://\)\{1\}\(.*\?\)" "/mnt/websites_folder/folder_to_search" -r 

now the grep seem to return everything after the external links in that given line

Example

if an html file contain something like this on same line

Google

https://yahoo.com’>Yahoo

then the given grep command return the following result

http://www.google.com">Google</a><p><a href='https://yahoo.com'>Yahoo</a></p>

the idea here is that if an html file contain more than one links(irrespective in a,img etc) in same line then the regex should fetch only the links and not all content of that line

I managed to developed the same in rubular.com
the regex is as follow

("|')(\b((ht|f)tps?:\/\/)(.*?)\b)("|')

with work with the above input
but iam not able to replicate the same in grep
can anyone help
I can’t modify the html file so don’t ask me to do that neither I can look for each specific tags and check their attributes to to get external links as it addup processing time and my application doesn’t demand that

Thank You

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T09:40:16+00:00Added an answer on May 15, 2026 at 9:40 am

    Try this:

    cat /path/to/file | egrep -o “(mailto|ftp|http(s)?://){1}[^’\”]+”

    egrep -o "(mailto|ftp|http(s)?://){1}[^'\"]+" /path/to/file
    

    Outputs one link per line. It assumes every link is inside single or double quotes. To exclude some certain domain links, use -v:

    egrep -o "(mailto|ftp|http(s)?://){1}[^'\"]+" /path/to/file | egrep -v "yahoo.com"
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

From past few days I am trying to find out the location of CGContextRef
I am new to vista and not a advanced programmer. From past few days
I've been quoting this segment from Sun's document for the past few days, and
These past few days I've been working toward converting my PHP code base from
I've been fighting with this for the past few days, and am hoping that
from the past few days i have been following a lot of tutorials regarding
Over the past few days, I have been trying to find an answer to
For the past few days I've been writing classes that at first I thought
I've spent several hours over the past few days trying to get PostgreSQL to
From the past few days, I have been working on an Android code to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.