I have Document document = Jsoup.connect(link).get(); and some times for some urls I get

Question

0

Asked: June 10, 20262026-06-10T20:03:46+00:00 2026-06-10T20:03:46+00:00

I have Document document = Jsoup.connect(link).get(); and some times for some urls I get

0

I have

Document document = Jsoup.connect(link).get();

and some times for some urls I get an exception:

Exception in thread "main" java.nio.charset.UnsupportedCharsetException: X-MAC-ROMAN
    at java.nio.charset.Charset.forName(Unknown Source)
    at org.jsoup.helper.DataUtil.parseByteData(DataUtil.java:86)
    at org.jsoup.helper.HttpConnection$Response.parse(HttpConnection.java:469)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:147)

I have a catch block as:

catch (IOException  e1)

I understand the exception is because java is unicode and that webpage/site is not following unicode. how to handle this issue also the connect is used for many websites which include both unicode and bytecode

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T20:03:48+00:00

I understand the exception is because java is unicode and that webpage/site is not following unicode.

That’s not entirely correct. You’re likely confusing the statement “Java is unicode” with the fact that Java uses Unicode to store strings/characters in memory (you know, a computer memory can only store bytes (zeroes and ones), not characters, therefore characters needs to be converted to bytes and back using a specific character encoding; Java is using unicode for this).

This exception occurs because the underlying operating system platform wherein your Java code runs doesn’t support this charset, so Java can’t convert the from the webserver obtained bytes to characters in this encoding. This charset is specific to Mac OS platforms and you’re likely running Windows or so.

how to handle this issue

Contact the website admin and report it as a bug. It’s their fault that they used a platform-specific (Mac OS) encoding instead of an universal (ISO/UTF) encoding.

As to Jsoup, your best bet is to get website as InputStream by URL#openStream() first and then feed it to Jsoup#parse() instead wherein you explicitly specify the character encoding which is supported on your platform, such as ISO-8859-1. E.g.:

Document doc = Jsoup.parse(new URL(link).openStream(), "ISO-8859-1", link);

Note that you still risk to end up with Mojibake when there are non-ASCII characters present. Also note that you shouldn’t do it for all links, but only for those which threw UnsupportedCharsetException (thus, perform the job in its catch block).

but I am able to access that in my chrome and why not from Jsoup

That is because Chrome is trying to be so kind for you that it ignored the unknown encoding and chooses a default encoding instead –which might still risk in the website being displayed in Mojibake; anything beyond the ASCII range might look malformed.

connect is used for many websites which include both unicode and bytecode

Please refresh your vocabulary on the meaning of the word “bytecode”. This has got absolutely nothing to do with character encodings.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have Document document = Jsoup.connect(link).get(); and some times for some urls I get

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply