I’m just getting started with HTTPClient , and I want to take a webpage

Question

0

Asked: May 15, 20262026-05-15T20:37:08+00:00 2026-05-15T20:37:08+00:00

I’m just getting started with HTTPClient , and I want to take a webpage

0

I’m just getting started with HTTPClient, and I want to take a webpage and extract out the raw text from it minus all the html markup.

Can HTTPClient accomplish that? If so, how? Or is there another library I should be looking at?

for example if the page contains

<body><p>para1 test info</p><div><p>more stuff here</p></div>

I’d like it to output

para1 test info more stuff here

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T20:37:09+00:00

Editorial Team

2026-05-15T20:37:09+00:00Added an answer on May 15, 2026 at 8:37 pm

I’d suggest using HttpComponents Client (HTTPClient 4) (instead of version 3 you’ve linked to).

This being said, it’s independent of the HTTP client library (there are others). What you need is to convert the HTML into plain text. This could be of interest: http://www.rgagnon.com/javadetails/java-0424.html

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m just getting started with HTTPClient , and I want to take a webpage

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply