U use Jsoup to fetch a website. The website has multiple div classes such as:
<div class="itemcategories">
Category: <a id="cat_result_7_newamerican" class="category" rel="newamerican" href="/search?cflt=newamerican&find_loc=willowbrook%2C+IL">American (New)</a>
</div>
<div class="itemcategories">
Categories:
<a id="cat_result_6_breakfast_brunch" class="category" rel="breakfast_brunch" href="/search?cflt=breakfast_brunch&find_loc=willowbrook%2C+IL">Breakfast & Brunch</a>,
<a id="cat_result_6_tradamerican" class="category" rel="tradamerican" href="/search?cflt=tradamerican&find_loc=willowbrook%2C+IL">American (Traditional)</a>
</div>
and so on.
If I use the following query selector:
categories = doc.select("div[class=itemcategories] > a[class=category]");
each child element that descend directly from div class=”itemcategories” parent is stored in the next index of the categories Elements object. So I have no way to determine which children belong to which parents. Is there a way to ‘concatenate’ all the children from each div class and save them in a separate indices of the Elements object?
How about doing it in two steps?
Note use of
.fooinstead of[class=foo]selector syntax.N.B. I’m not terribly familiar with jsoup’s API so this code may not be exactly correct.