Given a website, I wonder what is the best procedure, programmatically and/or using scripts,

Question

0

Asked: June 18, 20262026-06-18T03:20:42+00:00 2026-06-18T03:20:42+00:00

Given a website, I wonder what is the best procedure, programmatically and/or using scripts,

0

Given a website, I wonder what is the best procedure, programmatically and/or using scripts, to extract all email addresses that are present on each page in plain text in the form XXXX@YYYYY.ZZZZ from that link and all sites underneath, recursively or until some fixed depth.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T03:20:44+00:00

Using shell programming you can achieve your goal using 2 programs piped together:

wget: will get all pages
grep: will filter and give you only the emails

An example:

wget -q -r -l 5 -O - http://somesite.com/ | grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b"

wget, in quiet mode (-q), is getting all pages recursively (-r) with maximum depth level of 5 (-l 5) from somesite.com.br and printing everything to stdout (-O –).

grep is using an extended regular expression (-E) and showing only (-o) email address.

All emails are going to be printed to standard output and you can write them to a file by appending > somefile.txt to the command.

Read the man pages for more documentation on wget and grep.

This example was tested with GNU bash version 4.2.37(1)-release, GNU grep 2.12 and GNU Wget 1.13.4.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given a website, I wonder what is the best procedure, programmatically and/or using scripts,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply