Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7057883
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T04:02:48+00:00 2026-05-28T04:02:48+00:00

I am working on a simple web crawler to get a URL, Crawl first

  • 0

I am working on a simple web crawler to get a URL, Crawl first level links on the site and extract mails from all pages using RegEx…

I know it’s kinda sloppy and it’s just the beginning, but i always get “operation Timed Out” after 2 minutes of running the script..

 private void button1_Click(object sender, System.EventArgs e)
    {

        string url = textBox1.Text;
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        StreamReader sr = new StreamReader(response.GetResponseStream());
        string code = sr.ReadToEnd();
        string re = "href=\"(.*?)\"";
        MatchCollection href = Regex.Matches(code, @re, RegexOptions.Singleline);
        foreach (Match h in href)
        {

            string link = h.Groups[1].Value;
            if (!link.Contains("http://"))
            {
                HttpWebRequest request2 = (HttpWebRequest)WebRequest.Create(url + link);
                HttpWebResponse response2 = (HttpWebResponse)request2.GetResponse();
                StreamReader sr2 = new StreamReader(response.GetResponseStream());
                string innerlink = sr.ReadToEnd();


                MatchCollection m2 = Regex.Matches(code, @"([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)", RegexOptions.Singleline);


                foreach (Match m in m2)
                {
                    string email = m.Groups[1].Value;

                    if (!listBox1.Items.Contains(email))
                    {
                        listBox1.Items.Add(email);
                    }
                }
            }
        }

         sr.Close();
        }
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T04:02:48+00:00Added an answer on May 28, 2026 at 4:02 am

    Don’t parse Html using Regex. Use the Html Agility Pack for that.

    What is exactly the Html Agility Pack (HAP)?

    This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don’t HAVE to understand XPATH nor XSLT to use it, don’t worry…). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

    More Information

    • Html Agility Pack on Codeplex
    • How to use HTML Agility pack
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working on a simple web service which exports data from some data store
I'm working on a simple web crawler in python and I wan't to make
I'm working on StackQL.net , which is just a simple web site that allows
i am working on a simple web app which has a user model and
I am currently working on a simple web application through Google App engine using
I am working on a pretty simple web application (famous last words) and am
Well, simple question. I'm working with VS2008 on an ASP.NET web application which has
I have created a simple web service called TimeServerBean . It's working properly, the
I'm working on a simple web app using jQuery mobile, and it is designed
I am working on a simple web application made using Entity Framework 4.1 Code

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.