I am trying to extract the first and third columns of this data table

Question

0

Asked: June 13, 20262026-06-13T23:53:20+00:00 2026-06-13T23:53:20+00:00

I am trying to extract the first and third columns of this data table

0

I am trying to extract the first and third columns of this data table using BeautifulSoup. From looking at the HTML the first column has a <th> tag. The other column of interest has as <td> tag. In any case, all I’ve been able to get out is a list of the column with the tags. But, I just want the text.

table is already a list so I can’t use findAll(text=True). I’m not sure how to get the listing of the first column in another form.

from BeautifulSoup import BeautifulSoup
from sys import argv
import re

filename = argv[1] #get HTML file as a string
html_doc = ''.join(open(filename,'r').readlines())
soup = BeautifulSoup(html_doc)
table = soup.findAll('table')[0].tbody.th.findAll('th') #The relevant table is the first one

print table

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T23:53:20+00:00

You can try this code:

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())

for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

As you can see the code just connects to the url and gets the html, and the BeautifulSoup finds the first table, then all the ‘tr’ and selects the first column, which is the ‘th’, and the third column, which is a ‘td’.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to extract the first and third columns of this data table

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply