Working on a python scraper/spider and encountered a URL that exceeds the char limit with the titled IOError. Using httplib2 and when I attempt to retrieve the URL I receive a file name too long error. I prefer to have all of my projects within the home directory since I am using Dropbox. Anyway around this issue or should I just setup my working directory outside of home?
Working on a python scraper/spider and encountered a URL that exceeds the char limit
Share
The fact that the filename that’s too long starts with
'.cache/www.example.com'explains the problem.httplib2optionally caches requests that you make. You’ve enabled caching, and you’ve given it.cacheas the cache directory.The easy solution is to put the cache directory somewhere else.
Without seeing your code, it’s impossible to tell you how to fix it. But it should be trivial. The documentation for
FileCacheshows that it takes adir_nameas the first parameter.Or, alternatively, you can pass a
safefunction that lets you generate a filename from the URI, overriding the default. That would allow you to generate filenames that fit within the 144-character limit for Ubuntu encrypted fs.Or, alternatively, you can create your own object with the same interface as
FileCacheand pass that to theHttpobject to use as a cache. For example, you could usetempfileto create random filenames, and store a mapping of URLs to filenames in ananydbmorsqlite3database.A final alternative is to just turn off caching, of course.