Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7701681
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T23:03:05+00:00 2026-05-31T23:03:05+00:00

I have been working through the tutorial adapting it to a project I want

  • 0

I have been working through the tutorial adapting it to a project I want to achieve. I seem to have something going wrong that i just can’t find the error to.

When using ‘scrapy shell’ I can get the response I expect. So for this site Nrl Ladder

In [1]: hxs.select('//td').extract()
Out[1]: 
[u'<td>\r\n<div id="ls-nav">\r\n<ul><li><a href="http://www.nrlstats.com/"><span>Home</span></a></li>\r\n<li class="ls-nav-on"><a href="/nrl"><span>NRL</span></a></li>\r\n<li><a href="/nyc"><span>NYC</span></a></li>\r\n<li><a href="/rep"><span>Rep Matches</span></a></li>\r\n\r\n</ul></div>\r\n</td>',
 u'<td style="text-align:left" colspan="5">Round 4</td>',
 u'<td colspan="5">Updated: 26/3/2012</td>',
 u'<td style="text-align:left">1. Melbourne</td>',
 u'<td>4</td>',
 u'<td>4</td>',
 u'<td>0</td>',
 u'<td>0</td>',
 u'<td>0</td>',
 u'<td>122</td>',
 u'<td>39</td>',
 u'<td>83</td>',
 u'<td>8</td>',
 u'<td style="text-align:left">2. Canterbury-Bankstown</td>',

And on it goes.

I am really struggling to understand how to alter the tutorial project to change it to a different data type.

Is there anyway to bring up a help or documentation list to see what types I should use in items when using ‘td’ or any other item. Like i say it works easy in the shell but I cannot transform it to the files. Specifically both the team names and the points are ‘td’ but the team name is text.

here is what I have done.

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from nrl.items import NrlItem

class nrl(BaseSpider):
    name = "nrl"
    allowed_domains = ["http://live.nrlstats.com/"]
    start_urls = [
        "http://live.nrlstats.com/nrl/ladder.html",
        ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//td')
        items = []
        for site in sites:
           item = nrlItem()
           item['team'] = site.select('/text()').extract()
           item['points'] = site.select('/').extract()
           items.append(item)
        return items
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T23:03:06+00:00Added an answer on May 31, 2026 at 11:03 pm

    I didn’t quite understand your question, but here is a starting point, imo (haven’t tested; see some comments in the code):

    from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector
    
    from nrl.items import NrlItem
    
    class nrl(BaseSpider):
        name = "nrl"
        allowed_domains = ["live.nrlstats.com"] # domains should be like this
        start_urls = [
            "http://live.nrlstats.com/nrl/ladder.html",
            ]
    
        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            rows = hxs.select('//table[@class="tabler"]//tr[starts-with(@class, "r")]') # select team rows
            items = []
            for row in rows:
               item = nrlItem()
               columns = row.select('./td/text()').extract() # select columns for the selected row
               item['team'] = columns[0]
               item['P'] = int(columns[1])
               item['W'] = int(columns[2])
               ...
               items.append(item)
            return items
    

    UPDATE:

    //table[@class="tabler"//tr[starts-with(@class, "r")] is an xpath query. See some xpath examples here.

    hxs.select(xpath_query) always returns a list of nodes (also of type HtmlXPathSelector) which fall under the given query.

    hxs.extract() returns string representation of the node(s).

    P.S. Beware that scrapy supports XPath 1.0, but not 2.0 (at least on Linux, not sure about Windows), so some of the newest xpath features might not work.

    See also:

    • http://doc.scrapy.org/en/latest/topics/selectors.html
    • http://doc.scrapy.org/en/latest/topics/firefox.html
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm starting out with wxPython and have been working my way through every tutorial
I have been working through projects involving packages that do all the web design
I have been working through the tutorial by Alex Young on using flash messages
I am working on a QR code encoding/decoding project. I have been read through
I am new to asp.net and have been working through the tutorial - up
I have been working through a tutorial ( http://glacialflame.com/category/tutorial/ ) to build an Isometric
I have been going through this tutorial on auto-populating boxes using jQuery and Ajax:
I'm working on a symfony project, and I've been through the tutorial but this
I have been going through documentation and such and have SVN working, but I
I have been working my way through Scott Guthrie's excellent post on ASP.NET MVC

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.