Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 628307
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T19:37:21+00:00 2026-05-13T19:37:21+00:00

I’m currently using the HTML Agility Pack in C# for a web crawler. I’ve

  • 0

I’m currently using the HTML Agility Pack in C# for a web crawler. I’ve managed to avoid many issues so far (Invalid URIs, such as “/extra/url/to/base.html” and “#” links), but I also need to process PHP, Javascript, etc. Like for some sites, the links are in PHP, and when my web crawler tries to navigate to these, it fails. One example is a PHP/Javascript accordion link page. How would I go about navigating/parsing these links?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T19:37:22+00:00Added an answer on May 13, 2026 at 7:37 pm

    Lets see if I understood your question correctly. I’m aware that this answer is probably inadequate but if you need a more specific answer I’d need more details.


    You’re trying to program a web crawler but it cannot crawl URLs that end with .php?

    If that’s the case you need to take a step back and think about why that is. It could be because the crawler chooses which URLs to crawl using a regex based on an URI scheme.

    In most cases these URLs are just normal HTML but they could also be a generated image (like a captcha) or a download link for a 700mb iso file – and there’s no way to know be certain without checking out the header of the HTTP response from that URL.

    Note: If you’re writing your own crawler from scratch you’re going to need good understanding of HTTP.

    The first thing your crawler is going to see when gets an URL is the header, which contains a MIME content-type – it tells a browser/crawler how to process and open the data (is it HTML, normal text, .exe, etc). You’ll probably want to download pages based on the MIME type instead of an URL scheme. The MIME type for HTML is text/html and you should check for that using the HTTP library you’re using before downloading the rest of the content of an URL.


    The Javascript problem

    Same as above except that running javascript in the crawler/parser is pretty uncommon for simple projects and might create more problems than it solves. Why do you need Javascript?


    A different solution
    If you’re willing to learn Python (or already know it) I suggest you look at Scrapy. It’s a web crawling framework built similarly to the Django web framework. It’s really easy to use and a lot of problems have already been solved so it could be a good starting point if you’re trying to learn more about the technology.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 308k
  • Answers 308k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer something like this, assuming "main" and "ver2" are not constant… May 13, 2026 at 9:40 pm
  • Editorial Team
    Editorial Team added an answer Actually, archetypes are not m2eclipse specific, they are maven species.… May 13, 2026 at 9:40 pm
  • Editorial Team
    Editorial Team added an answer If you use any Extender in Content Page, the $find… May 13, 2026 at 9:40 pm

Related Questions

I want use html5's new tag to play a wav file (currently only supported
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I've got a string that has curly quotes in it. I'd like to replace
In order to apply a triggered animation to all ToolTip s in my app,

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.