here is the html <table> <tr> <td class=break>mono</td> </tr> <tr> <td>c1</td> <td>c2</td> <td>c3</td> </tr>

Question

0

Asked: June 16, 20262026-06-16T13:37:45+00:00 2026-06-16T13:37:45+00:00

here is the html <table> <tr> <td class=break>mono</td> </tr> <tr> <td>c1</td> <td>c2</td> <td>c3</td> </tr>

0

here is the html

<table>
<tr>
<td class="break">mono</td>
</tr>
<tr>
<td>c1</td>
<td>c2</td>
<td>c3</td>
</tr>
<tr>
<td>c11</td>
<td>c22</td>
<td>c33</td>
</tr>
<tr>
<td class="break">dono</td>
</tr>
<tr>
<td>d1</td>
<td>d2</td>
<td>d3</td>
</tr>
<tr>
<td>d11</td>
<td>d22</td>
<td>d33</td>
</tr>
</table>

Now I want output like this in a csv file:

mono c1 c2 c3
mono c11 c22 c33
dono d1 d2 d3
dono d11 d22 d33

But I am getting output like this:

mono
c1 c2 c3
c11 c22 c33
dono
d1 d2 d3
d11 d22 d33

Here is my code:

import codecs
from bs4 import BeautifulSoup
with codecs.open('dump.csv', "w", encoding="utf-8") as csvfile:


    f = open("input.html","r")

    soup = BeautifulSoup(f)
    t = soup.findAll('table')
    for table in t:
        rows = table.findAll('tr')
        for tr in rows:
            cols = tr.findAll('td')
            for td in cols:
                csvfile.write(str(td.find(text=True)))
                csvfile.write(",")
            csvfile.write("\n")

Please help me to resolve this issue.Thanks.

Edit:

Explained with some more details.Here I need to add first section (mono,dono etc) to be appended.

The rule here is that unless I encountered a new “break” class,text inside of that class should be appended to any tr below that.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T13:37:47+00:00

Since your new question is effectively an entirely different question from the original, here’s an entirely different answer:

for table in t:
    rows = table.findAll('tr')
    for row in rows:
        cols = row.findAll('td')
        if 'break' in cols[0].get('class', []):
            header = cols[0].text
        else:
            print header, ' '.join(col.text for col in cols)

I’m assuming that a row will either be exactly 1 “break” column, or 1 or more regular columns. If those assumptions aren’t true, the code can be modified.

Also, if the generator expression in the join function confuses you, the same thing can be rewritten as an explicit loop: print the header; then for each column, print that column; then print a newline.

Since you asked for an explanation of 'break' in cols[0].get('class', []), I’ll break it down.

cols is a list of the BS4 Tag objects for every td nodes in the current tr node.
cols[0] is the first one.
cols[0].get('class', []) treats the Tag object as a dictionary, as described in the docs, and calls the familiar get(key, defaultvalue) method on it.
- In BS4 (unlike older versions), looking up Tag attributes by name always returns a list. While BS3 would return 'foo bar' for <td class='foo bar'> and 'bar' for <td class='foo' class='bar'>, BS4 will return ['foo', 'bar'] for both.
Putting it all together, cols[0].get('class', []) will be ['break'] for the <td class='break'> case, and [] for all of the other cases in your sample input.

As mentioned above, I’m assuming that a row will either be exactly 1 “break” column, or 1 or more regular columns. You can see where I’m making use of those assumptions in the code. But if any of those assumptions are broken, you haven’t told us enough to know what you want to do in those cases.

If you have any rows with no columns, obviously the cols[0] will raise an IndexError. But you have to decide what to do in that case. Should it do nothing? Print just the header? Change to a state where nothing gets printed until we see a header row? Whatever you decide, it should be easy to code.

If you have any rows with a header followed by normal rows, the normal rows will be ignored. If you have any headers that aren’t the first column in a row, they will be treated like normal values. If you have multiple headers in the same row, all but the first will be ignored. And so on. In each case, this may or may not be what. But you have to decide what you want, before you can write the code.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

here is the html <table> <tr> <td class=break>mono</td> </tr> <tr> <td>c1</td> <td>c2</td> <td>c3</td> </tr>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply