Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 324847
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T09:09:43+00:00 2026-05-12T09:09:43+00:00

Is there a way to view the generated source of a web page (the

  • 0

Is there a way to view the generated source of a web page (the code after all AJAX calls and JavaScript DOM manipulations have taken place) from a C# application without opening up a browser from the code?

Viewing the initial page using a WebRequest or WebClient object works ok, but if the page makes extensive use of JavaScript to alter the DOM on page load, then these don’t provide an accurate picture of the page.

I have tried using Selenium and Watin UI testing frameworks and they work perfectly, supplying the generated source as it appears after all JavaScript manipulations are completed. Unfortunately, they do this by opening up an actual web browser, which is very slow. I’ve implemented a selenium server which offloads this work to another machine, but there is still a substantial delay.

Is there a .Net library that will load and parse a page (like a browser) and spit out the generated code? Clearly, Google and Yahoo aren’t opening up browsers for every page they want to spider (of course they may have more resources than me…).

Is there such a library or am I out of luck unless I’m willing to dissect the source code of an open source browser?

SOLUTION

Well, thank you everyone for you’re help. I have a working solution that is about 10X faster then Selenium. Woo!

Thanks to this old article from beansoftware I was able to use the System.Windows.Forms.WebBrowser control to download the page and parse it, then give em the generated source. Even though the control is in Windows.Forms, you can still run it from Asp.Net (which is what I’m doing), just remember to add System.Window.Forms to your project references.

There are two notable things about the code. First, the WebBrowser control is called in a new thread. This is because it must run on a single threaded apartment.

Second, the GeneratedSource variable is set in two places. This is not due to an intelligent design decision 🙂 I’m still working on it and will update this answer when I’m done. wb_DocumentCompleted() is called multiple times. First when the initial HTML is downloaded, then again when the first round of JavaScript completes. Unfortunately, the site I’m scraping has 3 different loading stages. 1) Load initial HTML 2) Do first round of JavaScript DOM manipulation 3) pause for half a second then do a second round of JS DOM manipulation.

For some reason, the second round isn’t cause by the wb_DocumentCompleted() function, but it is always caught when wb.ReadyState == Complete. So why not remove it from wb_DocumentCompleted()? I’m still not sure why it isn’t caught there and that’s where the beadsoftware article recommended putting it. I’m going to keep looking into it. I just wanted to publish this code so anyone who’s interested can use it. Enjoy!

using System.Threading;
using System.Windows.Forms;

public class WebProcessor
{
    private string GeneratedSource{ get; set; }
    private string URL { get; set; }

    public string GetGeneratedHTML(string url)
    {
        URL = url;

        Thread t = new Thread(new ThreadStart(WebBrowserThread));
        t.SetApartmentState(ApartmentState.STA);
        t.Start();
        t.Join();

        return GeneratedSource;
    }

    private void WebBrowserThread()
    {
        WebBrowser wb = new WebBrowser();
        wb.Navigate(URL);

        wb.DocumentCompleted += 
            new WebBrowserDocumentCompletedEventHandler(
                wb_DocumentCompleted);

        while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();

        //Added this line, because the final HTML takes a while to show up
        GeneratedSource= wb.Document.Body.InnerHtml;

        wb.Dispose();
    }

    private void wb_DocumentCompleted(object sender, 
        WebBrowserDocumentCompletedEventArgs e)
    {
        WebBrowser wb = (WebBrowser)sender;
        GeneratedSource= wb.Document.Body.InnerHtml;
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T09:09:43+00:00Added an answer on May 12, 2026 at 9:09 am

    it is possibly using an instance of a browser (in you case: the ie control). you can easily use in your app and open a page. the control will then load it and process any javascript. once this is done you can access the controls dom object and get the “interpreted” code.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there a way to view the key/value pairs of a NSDictionary variable through
Is there any way to view the activity log for the integrate SourceSafe inside
Is there any way to view the contents of the solution user options file
Is there a way to view the register contents in each stack frame in
Is there a way to link to another View that displays search results without
Is there a way to change the default pages used to edit/create/view a Sharepoint
I use ClearCase. I have a snapshot view. Is there a way to compare
Is there way in next piece of code to only get the first record?
is there way thats i can preselect an item when the page loads or
The Question Is there a way to alias the parameters generated by MVC, in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.