I am using BeautifulSoup to extract tabledata tags from a table. The TD’s have a class of either ‘a’,’u’,’e’,’available-unavailable’ or ‘unavailable-available’. (Yes, I know quirky class names but hey…)
Here’s an example:
<tr>
<td class="u">4</td>
<td class="unavailable-available">5</td>
<td class="a'>6</td>
<td class="available-unavailable">7</td>
<td class="u">8</td>
...
I’ve been working with a line which incorporates an re.compile():
tab = [int(tag.string) for tag in soup.find('table',{'summary':tableSummary}).findAll("td", attrs = {"class": re.compile('\Aa')})]
I need to extract all the td’s with a class name of ‘a’ and ‘unavailable-available’. I have been trying some negative-lookahead assertions but without much luck. I would value any regex legends who can produce the correct regex…
This matches start of string or whitespace followed by “a” or “unavailable-available” followed by whitespace or end of string. So it’ll match all these sorts of things