So here’s the scenario. I have a large html file that I would like

Question

0

Asked: June 14, 20262026-06-14T03:09:50+00:00 2026-06-14T03:09:50+00:00

So here’s the scenario. I have a large html file that I would like

0

So here’s the scenario. I have a large html file that I would like to scrape using JSoup. I am new to this and I have been going through some tutorials and the API references. I have the following block of html.

<p><a name="bob"></a>
<table class='schedules'>
<tr><td  align='center' colspan="5"><b>Bob the Builder</b><br>
<a href="blah blah" class='tiny'>Blah Blah Blah</a></td></tr>
<tr><td class='bk'><a href="random/randomUrl.htm">Blah</a></td><td class='bm'><a href="random/randomUrl.htm">Blah</a></td><td class='nm'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">Blah</a></td></tr>
<tr><td class='bk'><a href="random/randomUrl.htm">Blah</a></td><td class='bk'><a href="random/randomUrl.htm">Blah</a></td><td class='nm'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">Blah</a></td></tr>
<tr><td class='bk'><a href="random/randomUrl.htm">Blah</a></td><td class='bm'><a href="random/randomUrl.htm">Blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">Blah</a></td></tr>
<tr><td class='bm'><a href="random/randomUrl.htm">Blah</a></td><!--<td class='whoohaa'><a href="random/randomUrl.htm">Blah</a></td>--><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='cc'><a href="random/randomUrl.htm">blah</a></td><td class='cc'><a href="random/randomUrl.htm">Blah</a></td><td class='sk'><a href="random/randomUrl.htm">Blah</a></td></tr>
<tr></td><td class='sk'><a href="random/randomUrl.htm">Blah</a></td><td class='nm'><a href="random/randomUrl.htm">Blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">Blah</a></td></tr>
<tr><td class='sk'><a href="random/randomUrl.htm">Blah</a></td><td class='nm'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td><td class='sk'><a href="random/randomUrl.htm">blah</a></td></tr>
</table>
</p>

Now there are many more of these blocks following a similar pattern whereby (in the first line) the name attribute changes (from “bob” to something else). What I would like to do is firstly be able to select the “bob” p block and then retrieve all the html until the terminating p block in the last line.

I have attempted the following:

Elements innerStuff = doc.select("a:contains(bob) ~ *");

But it only gives me links with href atrributes, which I guess is the what will be expected. However, I’m struggling to see how else I can solve this problem?

Your help in this regard is highly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T03:09:51+00:00

A more straitforward way to select the a tag based on its name attribute would be to do:

doc.select("a[name=bob]")

From there, you should be able to navigate to the elements you want using parent() (to get the p tag containing the link) for example (you’ll need to call first() before to get the first (and only) element matching the selector):

doc.select("a[name=bob]").first().parent()

One problem though: the parsed HTML document is different from the original HTML:
Here’s the original HTML structure:

p
    a[name=bob]
    table
        ...

Here’s how the parsed HTML looks like:

p
    a[name=bob]
table
    ...
p

So, starting from the link tag, and to get that table’s element, you’ll need to go up one level (to the paragraph) and grab the next element:

doc.select("a[name=bob]").first().parent().nextElementSibling()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So here’s the scenario. I have a large html file that I would like

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply