So I’m playing around with Scrapy which is a set of classes that allows you to do web scraping and I wanted to throw some data into a data base, but I’m having truble importing the MySQL methods while extending the scrapy library.
here is my code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request
import MySQLdb
class test(BaseSpider): #if i don't extend the class the MySQL works, but the Scrapy functionallity does not.
name = "test"
allowed_domains = ["some-website.com"] #i know this is probibly not a real websit... just using it as an example.
start_urls = [
"http://some-website.com",
]
db = MySQLdb.connect(
host = 'localhost',
user = 'root',
passwd = '',
db = 'scrap'
)
#cursor = db.cursor()
def parse(self, response):
hxs = HtmlXPathSelector(response)
for title in hxs.select('//a[@class="title"]/text()').extract():
print title
cursor.execute("INSERT INTO `scrap`.`shows` (id, title) VALUES (NULL , '"+title+"');")
I am still a noob to python so any help would be greatly appreciated.
Something is wrong with your architecture.
Spider’s job is to parse pages, extract data and put it into an Item. It is pipeline’s job to save the data from an Item in a database:
So, make a pipeline, put its path into settings.py. Try to work with the DB in that pipeline.
I think you need to read the tutorial and see the examples.