I am new to python and havent found anything which suggests this is probably

Question

0

Asked: June 8, 20262026-06-08T01:25:50+00:00 2026-06-08T01:25:50+00:00

I am new to python and havent found anything which suggests this is probably

0

I am new to python and havent found anything which suggests this is probably dead easy.

The page I am scrapping is fairly simple but it completely updates every 2 minutes. I have managed to scrap all the data, but the issue is that even though the program runs every 2 minutes (I have tried through taskeng.exe and looping in the script), the html it is pulling from the website seems to refresh every 12 minutes. For the sake of clarity, the website I am scrapping has a time stamp when it updates. My program pulls that stamp (along with other data) and writes to a csv file. But its pulling the same data for 12 minutes and then suddenly the data arrives. So the output looks like:

16:30, Data1, Data2, Data3
16:30, Data1, Data2, Data3
...
16:30, Data1, Data2, Data3
16:42, Data1, Data2, Data3
16:42, Data1, Data2, Data3

where as it should be:

16:30, Data1, Data2, Data3
16:32, Data1, Data2, Data3
16:34, Data1, Data2, Data3
16:36, Data1, Data2, Data3
16:38, Data1, Data2, Data3
16:40, Data1, Data2, Data3
16:42, Data1, Data2, Data3

I think this has to do with the cache on myside. How can I force my http requests to completely refresh or force python to not store it in the cache?

I am using BeautifulSoup and Mechanize. My code for the http request is below:

mech = Browser()

url = "http://myurl.com"

page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)

If it helps to post all my code, I can do that. Thanks in advance for any advice

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T01:25:52+00:00

Editorial Team

2026-06-08T01:25:52+00:00Added an answer on June 8, 2026 at 1:25 am

You could use a simpler tool like requests.

import requests
response = requests.get(url)
html = response.text

But if you really want to stick with mechanize you can also skip the Browser() stuff (which is probably introducing cookies into your requests). Check the mechanize docs for more details.

response = mechanize.urlopen("http://foo.bar.com/")
html = response.read() # or readlines

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am new to python and havent found anything which suggests this is probably

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply