I am using scrapy to crawl a site which seems to be appending random

Question

0

Asked: May 27, 20262026-05-27T14:57:40+00:00 2026-05-27T14:57:40+00:00

I am using scrapy to crawl a site which seems to be appending random

0

I am using scrapy to crawl a site which seems to be appending random values to the query string at the end of each URL. This is turning the crawl into a sort of an infinite loop.

How do i make scrapy to neglect the query string part of the URL’s?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T14:57:41+00:00

See urllib.urlparse

Example code:

from urlparse import urlparse
o = urlparse('http://url.something.com/bla.html?querystring=stuff')

url_without_query_string = o.scheme + "://" + o.netloc + o.path

Example output:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from urlparse import urlparse
>>> o = urlparse('http://url.something.com/bla.html?querystring=stuff')
>>> url_without_query_string = o.scheme + "://" + o.netloc + o.path
>>> print url_without_query_string
http://url.something.com/bla.html
>>>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using scrapy to crawl a site which seems to be appending random

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply