I am trying to pull a specific URL using the the python by using

Question

0

Asked: June 18, 20262026-06-18T15:38:23+00:00 2026-06-18T15:38:23+00:00

I am trying to pull a specific URL using the the python by using

0

I am trying to pull a specific URL using the the python by using raw_html = urlopen(url).read().

When I inspect ‘raw_htm’ I find that the expected HTML/text has been replaced with some text that essentially tells me that I cannot crawl the site.

However, when I pull the same url using ‘curl -O’ from UNIX/python the page is downloaded just fine.

What is the reason for the discrepancy and what method should I use within python so that I can get the html as I do with the curl command in unix?

Thanks in advance for any thoughts!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T15:38:30+00:00

When an HTTP client makes a request, it identifies itself to the server. In this case, the server checks whether the client is a bot, and if it is, it refuses access (though apparently it fails to detect Curl).

You can get around this by setting the user-agent string to spoof a browser. See this question for how to do that with urllib. However, if the server’s owner does not want you to crawl it, and it detects that you’re doing so anyway (because you’re requesting pages at too high a rate), you might find yourself blocked from accessing the site, so contacting the owner might be a better idea than spoofing.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to pull a specific URL using the the python by using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply