I want to be able to grab content from web pages, especially the tags

Question

0

Asked: May 11, 20262026-05-11T13:29:47+00:00 2026-05-11T13:29:47+00:00

I want to be able to grab content from web pages, especially the tags

0

I want to be able to grab content from web pages, especially the tags and the content within them. I have tried XQuery and XPath but they don’t seem to work for malformed XHTML and REGEX is just a pain.

Is there a better solution. Ideally I would like to be able to ask for all the links and get back an array of URLs, or ask for the text of the links and get back an array of Strings with the text of the links, or ask for all the bold text etc.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T13:29:47+00:00

2026-05-11T13:29:47+00:00Added an answer on May 11, 2026 at 1:29 pm

Run the XHTML through something like JTidy, which should give you back valid XML.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to be able to grab content from web pages, especially the tags

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply