i have a serious problem.
i would like to extract the content from tag such as:
<div class="main-content">
<div class="sub-content">Sub content here</div>
Main content here </div>
output i would expect is:
Sub content here
Main content here
i’ve tried using regex, but the result isn’t so impressive.
By using:
Pattern.compile("<div>(\\S+)</div>");
would return all the strings before the first <*/div> tag
so, could anyone help me pls?
I’d recommend avoiding regex for parsing HTML. You can easily do what you ask by using Jsoup:
In response to comment: if you want to put the content of the
divelements into an array ofStrings you can simply do:In response to comment: if you have nested elements and you want to get own text for each element than you can use jquery multiple selector syntax. Here’s an example:
The code above will parse the following HTML:
and print the following output: