I’m test building a scraping site with django. For some reason the following code is only providing one picture image where i’d like it to print every image, every link, and every price, any help? (also, if you guys know how to place this data into a database model so I don’t have to always scrape the site, i’m all ears but that may be another question) Cheers!
Here is the template file:
{% extends "base.html" %}
{% block title %}Boats{% endblock %}
{% block content %}
<img src="{{ fetch_boats }}"/>
{% endblock %}
Here is the views.py file:
#views.py
from django.shortcuts import render_to_response
from django.template.loader import get_template
from django.template import Context
from django.http import Http404, HttpResponse
from fetch_images import fetch_imagery
def fetch_it(request):
fi = fetch_imagery()
return render_to_response('fetch_image.html', {'fetch_boats' : fi})
Here is the fetch_images module:
#fetch_images.py
from BeautifulSoup import BeautifulSoup
import re
import urllib2
def fetch_imagery():
response = urllib2.urlopen("http://www.boattrader.com/search-results/Type")
html = response.read()
#create a beautiful soup object
soup = BeautifulSoup(html)
#all boat images have attribute height=165
images = soup.findAll("img",height="165")
for image in images:
return image['src'] #print th url of the image only
# all links to detailed boat information have class lfloat
links = soup.findAll("a", {"class" : "lfloat"})
for link in links:
return link['href']
#print link.string
# all prices are spans and have the class rfloat
prices = soup.findAll("span", { "class" : "rfloat" })
for price in prices:
return price
#print price.string
Lastly, if needed the mapped url in urlconf is below:
from django.conf.urls.defaults import *
from mysite.views import fetch_it
urlpatterns = patterns('', ('^fetch_image/$', fetch_it))
Your
fetch_imageryfunction needs some work – since you’re returning (instead of usingyield), the firstreturn image['src']will terminate the function call (I’m assuming here that all those returns are part of the same function definition as shown by your code).Also, my assumption is that you will be returning a list/tuple (or defining a generator method) from
fetch_imageryin which case your template needs to look like:This will basically loop over all items (image urls in your case) in your list and will create
imgtags for each one of them.