Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4573484
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T19:47:28+00:00 2026-05-21T19:47:28+00:00

I have XML files (encoded in UTF-8) that have two issues: Some of them

  • 0

I have XML files (encoded in UTF-8) that have two issues:

  • Some of them (not all) contain a Byte order mark EF BB BF

  • Some of them (not all) contain Null characters 00, distributed over the whole file.

Both issues prevent me from parsing the XML with a SAX Parser. My current approach was to read the file into a String and use regex in order to extract these characters and write the string back to a file, which worked fine.
However my files are quite large (hundreds of Megabytes) and reading the file into a String an creating a result String of the same size every time I call a replaceAll(), quickly leads to a java heap space error.

Increasing the heap size is definitely not a long term solution. I will need to stream the file and extract all these character on the fly.

Any suggestions on how an efficient solution should look like?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T19:47:29+00:00Added an answer on May 21, 2026 at 7:47 pm

    I would subclass FilterInputStream to filter out the undesired bytes at runtime.

    The task should be rather easy as byte order marks are probably only at the start of the file (so you only need to check there) and nul-bytes can easily be flter with a simple == comparison (no need for regex-like features).

    This will most likely also increase performance as you don’t need to write out the full corrected file to disk before re-reading it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some .xml files that are encoded in UTF-8 . But whenever I
I have some xml files that contain text, which are displayed on my website.
I have a Java project that reads UTF-8 encoded .txt files in order to
I have two XML files that are generated by another application I have no
I have an automatically-generated XML file that is supposed to be encoded with UTF-8.
I have XML files in a directory that I wish to get over to
I have two XML files with two different XSD schemas and different namespaces. They
I have two XML files. They are similar, but there are two nodes of
I have some XML files which are used to generate my webpages, however I
I have two XML files. The structure of both XML files is as below:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.