I currently have the following section of HTML code from a web page:
<td class="movieclass">
<b>Cinema 1</b>
10.30 AM. + 12.45 + 3.00 + 5.15 + 7.30 + 9.45 + 12.00 MN.
<br />
<b>Cinema 2</b>
3.00 + 5.15 + 7.30 + 9.45 + (12.00 MN. THRS./FRI.)
<br />
<b>Cinema 3</b>
2.30 + 4.45 + 7.00 + 9.15 + (12.15 PM. + 11.30 PM. THRS./FRI.)
<br />
<b>Cinema 4</b>
11.30 AM. + 2.00 + 4.30 + 7.00 + 9.30 + 12.00 MN.
<br />
<b>Cinema 5</b>
10.30 AM. + 1.00 + 3.30 + 6.00 + 8.30 + 11.00 PM.
<br />
</td>
I’m trying to use jsoup to try and extract the time, but for a specific cinema. I’m assuming if the times lines were within paragraphs (p tags) I could use the following for extracting:
Elements movieTime = doc.select("a:contains(Cinema 3) + p");
However, in the code above the lines with times have no tags around them. Is there a way of extracting the times line for a certain cinema?
I decided to create a jsoup document with the HTML of the page:
Then did some string replacement on the document:
The replacing seems a bit extreme since it is going through the whole document, but I just need some tags around the section I needed to extract. Finally the extraction: