I’m trying to scrape a table on an ajax page with Beautiful Soup and print it out in table form with the TextTable library.
import BeautifulSoup
import urllib
import urllib2
import getpass
import cookielib
import texttable
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
...
def show_queue():
url = 'https://www.animenfo.com/radio/nowplaying.php'
values = {'ajax' : 'true', 'mod' : 'queue'}
data = urllib.urlencode(values)
f = opener.open(url, data)
soup = BeautifulSoup.BeautifulSoup(f)
stable = soup.find('table')
table = texttable.Texttable()
header = stable.findAll('th')
header_text = []
for th in header:
header_append = th.find(text=True)
header.append(header_append)
table.header(header_text)
rows = stable.find('tr')
for tr in rows:
cells = []
cols = tr.find('td')
for td in cols:
cells_append = td.find(text=True)
cells.append(cells_append)
table.add_row(cells)
s = table.draw
print s
...
Although the URL for the HTML in question I’m trying to scrape is shown in the code, here is an example of it:
<table cellspacing="0" cellpadding="0">
<tbody>
<tr>
<th>Artist - Title</th>
<th>Album</th>
<th>Album Type</th>
<th>Series</th>
<th>Duration</th>
<th>Type of Play</th>
<th>
<span title="...">Time to play</span>
</th>
</tr>
<tr>
<td class="row1">
<a href="..." class="songinfo">Song 1</a>
</td>
<td class="row1">
<a href="..." class="album_link">Album 1</a>
</td>
<td class="row1">...</td>
<td class="row1">
</td>
<td class="row1" style="text-align: center">
5:43
</td>
<td class="row1" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row1" style="text-align: center">
~0:00:00
</td>
</tr>
<tr>
<td class="row2">
<a href="..." class="songinfo">Song2</a>
</td>
<td class="row2">
<a href="..." class="album_link">Album 2</a>
</td>
<td class="row2">...</td>
<td class="row2">
</td>
<td class="row2" style="text-align: center">
6:16
</td>
<td class="row2" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row2" style="text-align: center">
~0:05:43
</td>
</tr>
<tr>
<td class="row1">
<a href="..." class="songinfo">Song 3</a>
</td>
<td class="row1">
<a href="..." class="album_link">Album 3</a>
</td>
<td class="row1">...</td>
<td class="row1">
</td>
<td class="row1" style="text-align: center">
4:13
</td>
<td class="row1" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row1" style="text-align: center">
~0:11:59
</td>
</tr>
<tr>
<td class="row2">
<a href="..." class="songinfo">Song 4</a>
</td>
<td class="row2">
<a href="..." class="album_link">Album 4</a>
</td>
<td class="row2">...</td>
<td class="row2">
</td>
<td class="row2" style="text-align: center">
5:34
</td>
<td class="row2" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row2" style="text-align: center">
~0:16:12
</td>
</tr>
<tr>
<td class="row1"><a href="..." class="songinfo">Song 5</a>
</td>
<td class="row1">
<a href="..." class="album_link">Album 5</a>
</td>
<td class="row1">...</td>
<td class="row1"></td>
<td class="row1" style="text-align: center">
4:23
</td>
<td class="row1" style="padding-left: 5px; text-align: center">
S.A.M.
</td>
<td class="row1" style="text-align: center">
~0:21:46
</td>
</tr>
<tr>
<td style="height: 5px;">
</td></tr>
<tr>
<td class="row2" style="font-style: italic; text-align: center;" colspan="5">There are x songs in the queue with a total length of x:y:z.</td>
</tr>
</tbody>
</table>
Whenever I try to run this script function, it aborts with TypeError: find() takes no keyword arguments on the line header_append = th.find(text=True). I’m sort of stumped, as it seems that I’m doing what is shown in code examples and it seems it should work, yet it doesn’t.
In short, how do I fix the code so that there is no TypeError and what am I doing wrong?
Edit:
Articles and documentation that I referred to when writing the script:
The Basic Issue
The parser is behaving correctly. You are just using the same expressions to parse different types of elements.
Revised code
Here is a snippet, focusing only on returning scraped lists. Once you have the lists, you can format the text table easily:
Output
headers,cells, andfooter, cells can now be fed into atexttableformatting function: