Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 717095
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T05:22:23+00:00 2026-05-14T05:22:23+00:00

I am reading data from a file that has, unfortunately, two types of character

  • 0

I am reading data from a file that has, unfortunately, two types of character encoding.

There is a header and a body. The header is always in ASCII and defines the character set that the body is encoded in.

The header is not fixed length and must be run through a parser to determine its content/length.

The file may also be quite large so I need to avoid bring the entire content into memory.

So I started off with a single InputStream. I wrap it initially with an InputStreamReader with ASCII and decode the header and extract the character set for the body. All good.

Then I create a new InputStreamReader with the correct character set, drop it over the same InputStream and start trying to read the body.

Unfortunately it appears, javadoc confirms this, that InputStreamReader may choose to read-ahead for effeciency purposes. So the reading of the header chews some/all of the body.

Does anyone have any suggestions for working round this issue? Would creating a CharsetDecoder manually and feeding in one byte at a time but a good idea (possibly wrapped in a custom Reader implementation?)

Thanks in advance.

EDIT: My final solution was to write a InputStreamReader that has no buffering to ensure I can parse the header without chewing part of the body. Although this is not terribly efficient I wrap the raw InputStream with a BufferedInputStream so it won’t be an issue.

// An InputStreamReader that only consumes as many bytes as is necessary
// It does not do any read-ahead.
public class InputStreamReaderUnbuffered extends Reader
{
    private final CharsetDecoder charsetDecoder;
    private final InputStream inputStream;
    private final ByteBuffer byteBuffer = ByteBuffer.allocate( 1 );

    public InputStreamReaderUnbuffered( InputStream inputStream, Charset charset )
    {
        this.inputStream = inputStream;
        charsetDecoder = charset.newDecoder();
    }

    @Override
    public int read() throws IOException
    {
        boolean middleOfReading = false;

        while ( true )
        {
            int b = inputStream.read();

            if ( b == -1 )
            {
                if ( middleOfReading )
                    throw new IOException( "Unexpected end of stream, byte truncated" );

                return -1;
            }

            byteBuffer.clear();
            byteBuffer.put( (byte)b );
            byteBuffer.flip();

            CharBuffer charBuffer = charsetDecoder.decode( byteBuffer );

            // although this is theoretically possible this would violate the unbuffered nature
            // of this class so we throw an exception
            if ( charBuffer.length() > 1 )
                throw new IOException( "Decoded multiple characters from one byte!" );

            if ( charBuffer.length() == 1 )
                return charBuffer.get();

            middleOfReading = true;
        }
    }

    public int read( char[] cbuf, int off, int len ) throws IOException
    {
        for ( int i = 0; i < len; i++ )
        {
            int ch = read();

            if ( ch == -1 )
                return i == 0 ? -1 : i;

            cbuf[ i ] = (char)ch;
        }

        return len;
    }

    public void close() throws IOException
    {
        inputStream.close();
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T05:22:24+00:00Added an answer on May 14, 2026 at 5:22 am

    Why don’t you use 2 InputStreams? One for reading the header and another for the body.

    The second InputStream should skip the header bytes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am reading in data from a file that has three columns. For example
When reading data from the Input file I noticed that the ¥ symbom was
I'm having some trouble reading data from a file into a vector of Orders.
I am reading data from a special device that is producing around 20 MB/s
I'm reading data from a table( from a MySQL Database) with Hibernate SQL Query.
I'm reading data from a device which measures distance. My sample rate is high
I'm reading XML data from the HttpServletRequest in my servlets doPost() and passing the
Right now, I'm particularly interested in reading the data from MP3 files (ID3 tags?),
We use a data acquisition card to take readings from a device that increases
How much data should be in a table so that reading is optimal? Assuming

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.