sorry for the stupid question … just started using python (but I love it).

Question

0

Asked: June 11, 20262026-06-11T12:20:52+00:00 2026-06-11T12:20:52+00:00

sorry for the stupid question … just started using python (but I love it).

0

sorry for the stupid question … just started using python (but I love it).

The problem:
I want to scrape data from the center for documentation of violism in syria. currently I’m using this scraper to collect the data. the problem is that I can access only one row instead of scraping all rows from the table.
the preferred output should look like

name status sex province area dateofdeath causeofdeath

import urllib2
from BeautifulSoup import BeautifulSoup
f = open('syriawar.tsv', 'w')
f.write("Row" + "\t" + "Data" + "\n")

for x in range (0,249):


syria = "file" + "\t" + str(x)
print "fetching data ... " + syria


url ='http://vdc-sy.org/index.php/en/martyrs/' + str(x) + '/c29ydGJ5PWEua2lsbGVkX2RhdGV8c29ydGRpcj1ERVNDfGFwcHJvdmVkPXZpc2libGV8c2hvdz0xfGV4dHJhZGlzcGxheT0wfA==' 

page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

sentence = soup.findAll('tr')[3].text

words = sentence
Data = str(words)

f.write(str(x) + "\t" + Data + "\n" )

f.close()

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T12:20:53+00:00

You need another layer of iteration. You should first call findAll(‘tr’) to get all the rows. Then remove the rows that are headers and empty and then loop through the remaining rows and call .text on those elements to get the text of the rows you want. Write each row to the file from within your inner loop.

Here is the script fixed. Note that the utf-8 codec had to used because the page contains unicode in the text. You should verify that this is getting everything you want. The empty tags were causing Beautiful Soup some problems.

import urllib2
from bs4 import BeautifulSoup
import codecs

f = codecs.open('syriawar.tsv', 'w', 'utf-8')
f.write("Row" + "\t" + "Data" + "\n")

for x in range (0,249):

  syria = "file" + "\t" + str(x)
  print "fetching data ... " + syria

  url ='http://vdc-sy.org/index.php/en/martyrs/' + str(x) + '/c29ydGJ5PWEua2lsbGVkX2RhdGV8c29ydGRpcj1ERVNDfGFwcHJvdmVkPXZpc2libGV8c2hvdz0xfGV4dHJhZGlzcGxheT0wfA=='

  page = urllib2.urlopen(url)
  soup = BeautifulSoup(page)

  rows = soup.findAll('tr')

  i = 0;
  for row in rows[3:]:
     if i%2 == 0:
        f.write(str(i/2) + "\t" + row.text + "\n" )
     i += 1

f.close()

Another spiffy way to do this is to use Scrapemark. It works great for tables and lists.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

sorry for the stupid question … just started using python (but I love it).

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply