Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7807117
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T02:45:10+00:00 2026-06-02T02:45:10+00:00

I started with an InputStreamReader, but this buffered its input, reading more than was

  • 0

I started with an InputStreamReader, but this buffered its input, reading more than was required from the input stream (as mentioned in its Java docs). Delving into the source code (java version “1.7.0_147-icedtea”) I got to the sun.nio.cs.StreamDecoder class, which contained the comment:

// In order to handle surrogates properly we must never try to produce
// fewer than two characters at a time.  If we're only asked to return one
// character then the other is saved here to be returned later.

So I guess the question becomes “is this true, and if so why?” From my (very basic!) understanding of the 6 charsets required by the JLS, it is always possible to determine the exact number of bytes required to read a single character, so no read-ahead would be necessary.

Background is I had a binary file containing a bunch of data with different encodings (numbers, strings, single byte tokens etc.). The basic format was a repeating set of byte marker (indicating the type of data) followed by optional data if required for that type. The two types containing character data were null-terminated strings and strings with a preceding 2-byte length. So for null terminated strings I thought something like this would do the trick:

String readStringWithNull(InputStream in) throws IOException {
  StringWriter sw = new StringWriter();
  InputStreamReader isr = new InputStreamReader(in, "UTF-16LE");
  for (int i; (i = isr.read()) > 0; ) {
    sw.write(i);
  }
  return sw.toString();
}

But the InputStreamReader read ahead from the buffer, so subsequent read operations on the base InputStream missed data. For my particular case I knew that all characters would be UTF-16LE BMP (sort of UCS-2LE) so I just coded around that, but I’m still interested in the general case above.

Also, I’ve seen InputStreamReader buffering issue which is similar, but does not appear to answer this specific question.

Cheers,

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T02:45:11+00:00Added an answer on June 2, 2026 at 2:45 am

    So I guess the question becomes “is this true, and if so why?”

    Yes the comment is correct, though possibly a bit obscure in its phraseology.

    A UTF-8 encoding of a single Unicode code-point consists of between 1 and 4 bytes; see the Wikipedia UTF-8 examples.. But in some cases, the Unicode code-point cannot be represented as one Java char. So the decoder potentially has to decode the multi-byte UTF-8 sequence as TWO Java char values … and hold one of them back.

    From my (very basic!) understanding of the 6 charsets required by the JLS, it is always possible to determine the exact number of bytes required to read a single character, so no read-ahead would be necessary.

    It is a bit more complicated than this for variable-length encodings. The decoder reads ahead just enough bytes to form one Unicode code-point. This will be between 1 and 4 bytes for UTF-8, and by examining the bytes it knows when to stop. Then it decodes the bytes as 1 or 2 UTF-16 code-units (i.e. Java char values), delivers the first one, and saves the second one.

    So you are potentially reading ahead in terms of bytes, but not in terms of code-points. And that is fine because the user’s keyboard (for example) is generating code-points.


    Also, it should be possible to create an unbuffered reader which performs exactly as the standard one, but only pulls a single code-point at a time from the underlying stream, and so could be used in my example above.

    Yes it should be possible to do this. However such a reader would need to make up to 4 separate system calls in order to read a single code-point, and that is very inefficient.

    In fact, wouldn’t this appear to be a preferred implementation, as I can always buffer the stream myself if required.

    No, it is not the preferred implementation. Yes, you could (in theory) buffer the stream yourself below the encoder. However most programs aren’t written to build the stack like this:

    Buffered Reader > InputStreamReader > BufferedInputStream > raw InputStream
    

    instead they just do this:

    Buffered Reader > InputStreamReader > raw InputStream
    

    which would make your approach perform really slowly. (And you try explaining to the average Joe programmer why he should put an extra explicit buffering layer into the stack.)

    The standard InputStreamReader from OpenJDK7 appears to immediately read and buffer up to 8k from the base stream.

    If they didn’t do something like this, performance would be terrible … see above. Besides, this is documented behavior – the javadoc says:

    “Each invocation of one of an InputStreamReader’s read() methods may cause one or more bytes to be read from the underlying byte-input stream. To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.”

    The bottom line is that your use-case (where you want absolutely no low-level read-ahead on a Reader stack.) is highly unusual, and not supported by the Java SE standard class library. If you really need this, feel free to implement your own version of InputStreamReader that doesn’t read ahead. But it strikes me as a bit odd that you would really need this.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Started with this question: OpenID. How do you logout OK. So OpenID does not
Started working on a new application this week that is running the latest rails
So I create a Socket in my Service which is started from an Activity.
When i use one jpeg image with this web-server its very slow to show
I found a client/server code and I am getting this error: java.net.SocketException: Software caused
I've got codes of a server and clients written on Java. But the question
I'm trying to send data from client socket to the server but, when data
I need to execute from Java a batch script, which does following 1) Once
Started to have an issue this afternoon with the delayed_job process, found that the
Started a new project from scratch, converted to JPA, my persistence provider is EclipseLink,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.