I don't know about jQuery UI, but in general, this…

Question

0

Asked: May 13, 20262026-05-13T21:58:34+00:00 2026-05-13T21:58:34+00:00

while using beautifulsoup to parse a table in html every other row starts with

0

while using beautifulsoup to parse a table in html every other row starts with

<tr class="row_k">

instead of a tr tag without a class

Sample HTML

<tr class="row_k"> 
<td><img src="some picture url" alt="Item A"></td> 
<td><a href="some url"> Item A</a></td> 
<td>14.8k</td> 
<td><span class="drop">-555</span></td> 
<td> 
<img src="some picture url" alt="stuff" title="stuff"> 
</td> 
<td> 
<img src="some picture url" alt="Max llll"> 
</td> 
</tr> 
<tr> 
<td><img src="some picture url" alt="Item B"></td> 
<td><a href="some url"> Item B</a></td> 
<td>64.9k</td> 
<td><span class="rise">+165</span></td> 
<td> 
<img src="some picture url" alt="stuff" title="stuff"> 
</td> 
<td> 
<img src="some picture url" alt="max llll"> 
</td> 
</tr> 
<tr class="row_k"> 
<td><img src="some picture url" alt="Item C"></td> 
<td><a href="some url"> Item C</a></td> 
<td>4,000</td> 
<td><span class="rise">+666</span></td> 
<td> 
<img src="some picture url" title="stuff"> 
</td> 
<td> 
<img src="some picture url" alt="Maximum lllle">

Text I wish to extract is 14.8k, 64.9k, and 4,000

this1 = urllib2.urlopen('my url').read()
this_1 = BeautifulSoup(this1)
this_1a = StringIO.StringIO()
for row in this_1.findAll("tr", { "class" : "row_k" }):
  for col in row.findAll(re.compile('td')):
    this_1a.write(col.string if col.string else '')
Item_this1 = this_1a.getvalue()

I get the feeling that this code is poorly written, Is there a more flexible tool I can use such as an XML parser? that someone could suggest.

still open to any answers that still utilize beautifulsoup.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T21:58:34+00:00

I am still learning a lot but I am going to suggest you try lxml. I am going to make a stab at this and I think it will mostly get you there but there may be some niceties I am not certain about.

assuming this1 is a string

from lxml.html import fromstring
this1_tree=fromstring(this1)
all_cells=[(item[0], item[1]) for item in enumerate(this1_tree.cssselect('td'))] # I am hoping this gives you the cells with their relative position in the document)

The only thing I am not totally certain about is whether you test the key or value or text_content for each cell to find out if it has the string that you are seeking in the anchor reference or text. That is why I wanted a sample of your html. But one of those should work

the_cell_before_numbers=[]
for cell in all_cells:
    if 'Item' in cell[1].text_content():
        the_cell_before_numbers.append(cell[0])

Now that you have the cell before your can then get the value you need by getting the text content of the next cell

todays_price=all_cells[the_cell_before_number+1][1].text_content()

I am sure there is a prettier way but I think this will get you there.

I tested using your html and I got what you were looking for.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

while using beautifulsoup to parse a table in html every other row starts with

Leave an answerCancel reply

1 Answer

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Leave an answer
Cancel reply