I called the following code to visit a url and tried to print the

Question

0

Asked: June 4, 20262026-06-04T22:58:02+00:00 2026-06-04T22:58:02+00:00

I called the following code to visit a url and tried to print the

0

I called the following code to visit a url and tried to print the content on that page:

import urllib2
f = urllib2.urlopen("https://www.reaxys.com/reaxys/secured/customset.do?performed=true&action=get_preparations&searchParam=1287039&workflowId=1338317532514&workflowStep=1&clientDateTime=2012-05-29%2015:17")
page = f.read()
print page
f.close()

I’m not sure if the url is accessible everywhere, so the content on that page might not be accessible to everyone.

This page sets a time constraints on how long a user can stay on the page, and after that time, a popup would show up saying the user has reached the timeout.

Here’s the problem I bumped into:
When I typed the url into a browser, everything opened just fine. But when I tried printing what Python read from that page, Python read the page that would only pop out when the page has reached a timeout.

I don’t know what’s wrong, is it Python or the website? How can I make Python read the actual content on that page?

Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T22:58:04+00:00

It appears to be related to cookies being set by the website. If I visit the URL

https://www.reaxys.com/reaxys/secured/customset.do?performed=true&action=get_preparations&searchParam=1287039&workflowId=1338317532514&workflowStep=1

in my browser, I get the same timeout error. If I refresh, the site loads fine. But if I clear my cookies from the site and retry, I get the timeout again. So, I suspect that the site is executed some process that adds a timestamp and checks it before the page is visible, and defaults to a timeout if for some reason the cookie can’t be set (as would be the case with a visit from within a Python script).

I would suggest doing an in-depth investigation of the cookies being set (start with the Javascript on that page, which seems to be handling some of the timeout logic), and then try setting cookies from the scraping process as per: http://www.testingreflections.com/node/view/5919 , http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/ , or the like.

(This is in no way intended to condone the scraping of an Elsevier site, as they may come after you and eat your young 🙂 )

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I called the following code to visit a url and tried to print the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply