Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7658211
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T13:10:20+00:00 2026-05-31T13:10:20+00:00

forgive me, I am a total programming noob. I am trying to extract a

  • 0

forgive me, I am a total programming noob.

I am trying to extract a record id from a url with the following code and Im running into trouble. If I run it through the shell it seems to work fine (no errors) but when I run it through scrapy the framework generates errors

Example:
if the url is http://domain.com/path/to/record_id=1599
then record_link = /path/to/record_id=1599
therefore record_id should be = 1599

   for site in sites:

      record_link = site.select('div[@class="description"]/h4/a/@href').extract()
      record_id = record_link.strip().split('=')[1]

      item['link'] = record_link
      item['id'] = record_id
      items.append(item)

any help is greatly appreciated

EDIT::

Scrapy errors like something like this:

   root@web01:/home/user/spiderdir/spiderdir/spiders# sudo scrapy crawl spider
   2012-02-23 09:47:16+1100 [scrapy] INFO: Scrapy 0.13.0.2839 started (bot: spider)
   2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
   2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
   2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
   2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled item pipelines:
   2012-02-23 09:47:16+1100 [spider] INFO: Spider opened
   2012-02-23 09:47:16+1100 [spider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
   2012-02-23 09:47:16+1100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6031
   2012-02-23 09:47:16+1100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6088
   2012-02-23 09:47:19+1100 [spider] DEBUG: Crawled (200) <GET http://www.domain.com/path/to/> (referer: None)
   2012-02-23 09:47:21+1100 [spider] DEBUG: Crawled (200) <GET http://www.domain.com/path/to/record_id=2> (referer: http://www.domain.com/path/to/)
   2012-02-23 09:47:21+1100 [spider] ERROR: Spider error processing <GET http://www.domain.com/path/to/record_id=2>
   Traceback (most recent call last):
      File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 778, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/usr/lib/python2.6/dist-packages/twisted/internet/task.py", line 577, in _tick
        taskObj._oneWorkUnit()
      File "/usr/lib/python2.6/dist-packages/twisted/internet/task.py", line 458, in _oneWorkUnit
        result = self._iterator.next()
      File "/usr/lib/pymodules/python2.6/scrapy/utils/defer.py", line 57, in <genexpr>
        work = (callable(elem, *args, **named) for elem in iterable)
    --- <exception caught here> ---
      File "/usr/lib/pymodules/python2.6/scrapy/utils/defer.py", line 96, in iter_errback
        yield it.next()
      File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/offsite.py", line 24, in process_spider_output
        for x in result:
      File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/referer.py", line 14, in <genexpr>
        return (_set_referer(r) for r in result or ())
      File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/urllength.py", line 32, in <genexpr>
        return (r for r in result or () if _filter(r))
      File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/depth.py", line 56, in <genexpr>
        return (r for r in result or () if _filter(r))
      File "/usr/lib/pymodules/python2.6/scrapy/contrib/spiders/crawl.py", line 66, in _parse_response
        cb_res = callback(response, **cb_kwargs) or ()
      File "/home/nick/googledir/googledir/spiders/google_directory.py", line 36, in parse_main
        record_id = record_link.split("=")[1]
    exceptions.AttributeError: 'list' object has no attribute 'split'

`

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T13:10:20+00:00Added an answer on May 31, 2026 at 1:10 pm

    I think what I’m after is something like this:

    for site in sites:
    
          record_link = site.select('div[@class="description"]/h4/a/@href').extract()
          record_id = [i.split('=')[1] for i in record_link]
    
      item['link'] = record_link
      item['id'] = record_id
      items.append(item)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Forgive the noob question, but I'm trying to get my head wrapped around this.
Forgive me for a potentially silly question here, but in other programming languages (scripting
Ok, this is a total newb question, so please forgive me. What is the
Forgive me for this noob question, but is there such a setting that sets
I've spent about three weeks on trying to track down this error, so forgive
Forgive me if this is a silly question! But to run trinidad as a
Forgive me for asking what some might think are stupid questions. I am trying
Forgive me for this is a very simple script in Bash. Here's the code:
First please forgive me for total lack of understanding of Varnish. This is my
Forgive me, I am a total n00b with javascript! I have a complicated request

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.