Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 233181
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T20:04:46+00:00 2026-05-11T20:04:46+00:00

I’m trying to figure out the best way to do caching for a website

  • 0

I’m trying to figure out the best way to do caching for a website I’m building. It relies heavily on screen scraping the wikipedia website. Here is the process that I’m currently doing:

  1. User requests a topic from wikipedia via my site (i.e. http://www.wikipedia.org/wiki/Kevin_Bacon would be http://www.wikipediamaze.com/wiki?topic?=Kevin_Bacon) NOTE: Because IIS can’t handle requests that end in a ‘.’ I’m forced to use the querystring parameter
  2. Check to see if I’ve already stored the formatted html in my database and if it does then just display it to the user
  3. Otherwise I perform a web request to wikipedia
  4. Decompress the stream if needed.
  5. Do a bunch of DOM manipulation to get rid of the stuff I don’t need (and inject stuff I do need).
  6. Store the html in my database for future requests
  7. Return the html to the browser

Because it relies on screen scraping and DOM manipulation I am trying to keep things speedy so that I only have to do it once per topic instead of for every single request. Here are my questions:

  1. Is there a better way of doing caching or additional things I can do to help performace?
  2. I know asp.net has built in caching mechanism, but will it work in the way that I need it to? I don’t want to have to retrieve the html (pretty heavy) from the database on every request, but I DO need to store the html so that every user get’s the same page. I only ever want to get the data from Wikipedia 1 time.
  3. Is there anything I can do with compression to get it to the browser quicker and if so can the browser handle uncmopressing and displaying the html? Or is this not even a consideration. The only reason I’m asking is that because some of the pages wikipedia sends me through the HttpWebRequest come through as a gzip stream.

Any and all suggestions, guidance, etc. are much appreciated.

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T20:04:46+00:00Added an answer on May 11, 2026 at 8:04 pm

    You can try to enable the OutputCache for your page with VaryByParam=topic. That stores a copy of the page in memory if multiple clients request it. When the page is not in memory, the server can retrieve it from your database. The beauty of OutputCache is that you can even store a gzipped version of the HTML (use VaryByEncoding)

    If it’s a problem for you to decompress the stuff you get from Wikipedia, then don’t send an Accept-Encoding header. That should force Wikipedia to send the page to you uncompressed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I used javascript for loading a picture on my website depending on which small
I am trying to render a haml file in a javascript response like so:
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
We're building an app, our first using Rails 3, and we're having to build

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.