I have a problem accessing the Project Gutenberg Library… I am using Python 2.7.3.

Question

0

Asked: June 14, 20262026-06-14T08:51:57+00:00 2026-06-14T08:51:57+00:00

I have a problem accessing the Project Gutenberg Library… I am using Python 2.7.3.

0

I have a problem accessing the Project Gutenberg Library…
I am using Python 2.7.3.
I can access the NLTK library and work with python, but when attempting to access raw text, it doesn’t allow me to.

The text I was accessing is Crime and Punishment, it’s len(raw) should equal 1176831, but instead gives me a len(raw) of 288.
Here is the code that I used:

>>> from __future__ import division
>>> import nltk, re, pprint
>>> from urllib import urlopen
>>> url = "http://www.gutenberg.org/files/2554/2554.txt"
>>> raw = urlopen(url).read()
>>> type(raw)
<type 'str'>
>>> len(raw)
288
>>> raw
'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don\'t have permission to access /files/2554/2554.txt\non this server.</p>\n<hr>\n<address>Apache Server at www.gutenberg.org Port 80</address>\n</body></html>\n'
>>>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T08:51:59+00:00

Editorial Team

2026-06-14T08:51:59+00:00Added an answer on June 14, 2026 at 8:51 am

The reason for the HTTP 403 response can be found here. Basically the site is “for human (non-automated) users only. Any perceived use of automated tools to access our web site will result in a temporary or permanent block of your IP address or subnet.”

Your code “should work”, but the website is determining you are accessing the site through code and not a browser. That is all I will say. 🙂

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a problem accessing the Project Gutenberg Library… I am using Python 2.7.3.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply