Im trying to extract some fields from the output at the end of this

Question

0

Asked: June 10, 20262026-06-10T12:04:01+00:00 2026-06-10T12:04:01+00:00

Im trying to extract some fields from the output at the end of this

0

Im trying to extract some fields from the output at the end of this question with the following code:

doc = LH.fromstring(html2)
tds = (td.text_content() for td in doc.xpath("//td[not(*)]"))

for a,b,c in zip(*[tds]*3):
    print (a,b,c)

What i expect is to extract only the fields notificationNodeName,notificationNodeName,packageName,notificationEnabled

The main problem with that is because i want to put the result into a database. and i need to, instead receiveing:

Actual code returns:

('JDBCAdapter', 'JDBCAdapter', 'Package:Notif')
('Package', 'yes', 'Package_2:Notif')
('Package_2', 'yes')

What i need:

('Package:Notif','Package', 'yes')
('Package_2:Notif','Package_2', 'yes')

An unly solution that i found was:

doc = LH.fromstring(html2)
tds = (td.text_content() for td in doc.xpath("//td"))

for td, val in zip(*[tds]*2):
    if td == 'notificationNodeName':
        notificationNodeName = val
    elif td == 'packageName':
        packageName = val
    elif td == 'notificationEnabled':
        notificationEnabled = val
        print (notificationNodeName,packageName,notificationEnabled)

It works but doenst seen right for me, im sure it can be a better way to do it.

Original HTML Output:

<tbody><tr>
<td valign="top"><b>adapterTypeName</b></td>
<td>JDBCAdapter</td>
</tr>
<tr>
<td valign="top"><b>adapterTypeNameList</b></td>
<td>
<table>
<tbody><tr>
<td>JDBCAdapter</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td valign="top"><b>notificationDataList</b></td>
<td>
<table>
<tbody><tr>
<td><table bgcolor="#dddddd" border="1">
<tbody><tr>
<td valign="top"><b>notificationNodeName</b></td>
<td>package:Notif</td>
</tr>
<tr>
<td valign="top"><b>packageName</b></td>
<td>Package</td>
</tr>
<tr>
<td valign="top"><b>notificationEnabled</b></td>
<td>unsched</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td><table bgcolor="#dddddd" border="1">
<tbody><tr>
<td valign="top"><b>notificationNodeName</b></td>
<td>Package_2:notif</td>
</tr>
<tr>
<td valign="top"><b>packageName</b></td>
<td>package_2</td>
</tr>
<tr>
<td valign="top"><b>notificationEnabled</b></td>
<td>yes</td>
</tr>

and continues to more ... non relevant repetitive data.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T12:04:03+00:00

I would recommend using the excellent lxml and it’s cssselect functionality for basically most HTML parsing.

You can then select each field you are interested in thusly:

from lxml import html
root = html.parse(open('your/file.html')).getroot()

sibling_content = lambda x: [b.getparent().getnext().text_content() for b in
                             root.cssselect("td b:contains('{0}')".format(x))]

fields = ['notificationNodeName', 'packageName', 'notificationEnabled']

for item in zip(*[sibling_content(field) for field in fields]):
    print item

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Im trying to extract some fields from the output at the end of this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply