This is a bit of a weird one. I’m using HTTPClient 4.1.2, and it

Question

0

Asked: May 26, 20262026-05-26T14:13:31+00:00 2026-05-26T14:13:31+00:00

This is a bit of a weird one. I’m using HTTPClient 4.1.2, and it

0

This is a bit of a weird one. I’m using HTTPClient 4.1.2, and it seems that whenever it finds are URL with something like a ‘#’ in it, it does a full get with the # in the URL.

For example, trying to get the URL http://stks.co/eWt will redirect to the URL http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter. Now this URL is live, but the problem is the HTTPClient sends a get request with the URI set to URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter which causes the server to send back a 404 page not found.

Looking at the GET sent by IE, Firefox and cURL, they all strip out the #… from the end of the URI, so for example the cURL GET request URI is set as URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/ – all the #… have been removed. This is for the exact same entry URL of http://stks.co/eWt.

As a test, sending this raw URL into HTTPClient (i.e. HttpGet httpget = new HttpGet("http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter");) gives the same 404 not found result.

So the question is are there any settings in HTTPClient that can be set so that things like the trailing #… can be auto removed from URLs. Or how would I go about manually removing this from URLs (remember that I would need to capture all redirect URLs as well)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T14:13:31+00:00

It sounds like their web server is broken. The URI specification says that a number sign (#) terminates the path portion of the URI. If a web server considers anything after a # part of the path, it is not following the URI specification.

The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component, serves to identify a resource within the scope of the URI’s scheme and naming authority (if any). The path is terminated by the first question mark (“?”) or number sign (“#”) character, or by the end of the URI.” – RFC3986

I tested a few popular web servers, and they all parse these URIs correctly, ignoring the portion after the number sign.

I don’t have any good suggestions for a workaround though. But at least now you know who to blame.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is a bit of a weird one. I’m using HTTPClient 4.1.2, and it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply