I am new to Scrapy and really python as well. I am trying to

Question

0

Asked: June 17, 20262026-06-17T21:38:38+00:00 2026-06-17T21:38:38+00:00

I am new to Scrapy and really python as well. I am trying to

0

I am new to Scrapy and really python as well. I am trying to write a scraper that will extract article title, link and article description ALMOST like an RSS feed from a web page to help me with my thesis. I’ve written the following scraper and when I run it and export it as a .txt it comes back blank. I believe I need to add in an Item Loader but I am not positive.

Items.py

from scrapy.item import Item, Field

class NorthAfricaItem(Item):
    title = Field()
    link = Field()
    desc = Field()
    pass

Spider

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from northafricatutorial.items import NorthAfricaItem

class NorthAfricaItem(BaseSpider):
   name = "northafrica"
   allowed_domains = ["http://www.north-africa.com/"]
   start_urls = [
       "http://www.north-africa.com/naj_news/news_na/index.1.html",
   ]

 def parse(self, response):
 hxs = HtmlXPathSelector(response)
 sites = hxs.select('//ul/li')
 items = []
 for site in sites:
     item = NorthAfricaItem()
     item['title'] = site.select('a/text()').extract()
     item['link'] = site.select('a/@href').extract()
     item['desc'] = site.select('text()').extract()
     items.append(item)
 return items

UPDATE

Thanks to Talvalin for the help and additionally with some messing around I was able to fix the problem. I was using a stock script that I found online. However once I utilized the shell I was able to find the correct tags to get what I needed. Ive ended up with:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from northafrica.items import NorthAfricaItem

class NorthAfricaSpider(BaseSpider):
   name = "northafrica"
   allowed_domains = ["http://www.north-africa.com/"]
   start_urls = [
       "http://www.north-africa.com/naj_news/news_na/index.1.html",
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//ul/li')
       items = []
       for site in sites:
           item = NorthAfricaItem()
           item['title'] = site.select('//div[@class="short_holder"]    /h2/a/text()').extract()
       item['link'] = site.select('//div[@class="short_holder"]/h2/a/@href').extract()
       item['desc'] = site.select('//span[@class="summary"]/text()').extract()
       items.append(item)
   return items

If anyone sees anything here I did wrong let me know……but it works.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T21:38:39+00:00

The thing to note about this code is that it runs with an error. Try running the spider via the command line and you will see something along the lines of:

        exceptions.TypeError: 'NorthAfricaItem' object does not support item assignment

2013-01-24 16:43:35+0000 [northafrica] INFO: Closing spider (finished)

The reason why this error is occurring is because you’ve given your spider and your item classes the same name: NorthAfricaItem

In your spider code, when you create an instance of NorthAfricaItem to assign things to (like title, link and desc), the spider version takes precedence over the item version. Since the spider version of NorthAfricaItem is not actually a type of Item, the item assignment fails.

To fix the issue, rename your spider class to something like NorthAfricaSpider and the problem should be resolved.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am new to Scrapy and really python as well. I am trying to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply