i am parsing an html page using XmlSlurper and HtmlCleaner, i have the GPathResult

Question

0

Asked: May 19, 20262026-05-19T00:52:16+00:00 2026-05-19T00:52:16+00:00

i am parsing an html page using XmlSlurper and HtmlCleaner, i have the GPathResult

0

i am parsing an html page using XmlSlurper and HtmlCleaner, i have the GPathResult with

def page = new XmlSlurper(false,false).parseText(xml)

now i can use GPath to access the various nodes.

In the html i have a paragraph like this one:

<p>
 some_text1
 <br />
 some_text2
 <br />
 some_text3
 <br />

 ....
 some_textN


 <br />
</p>

the problem is that now i don’t know how to parse the text in the paragraph, i need to split the text inside the paragraph using the <br /> tag as separator and get a list like

[some_text, some_text1, some_text2, .... ,some_textN]

Having the node like

def node = page.body.some_path.p[0]

if i use text() i get all the text in the paragraph but without the <br /> so i cannot use the split method, and i don’t find a way to get the real html inside the paragraph from the node.

There is some way to parse this text?

Thanks for the help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T00:52:16+00:00

Editorial Team

2026-05-19T00:52:16+00:00Added an answer on May 19, 2026 at 12:52 am

I’ve had this problem in the past with GPath and couldn’t really find a good way to go about it either.

What I ended up doing is a search/replace for <br /> in this case, replacing it with something that isn’t an XML element. Call it REPLACEMENT_SEPARATOR.

That way, you could call node.text().split(REPLACEMENT_SEPARATOR) and get your array.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i am parsing an html page using XmlSlurper and HtmlCleaner, i have the GPathResult

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply