Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8046757
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T05:53:43+00:00 2026-06-05T05:53:43+00:00

I have flat files that I can load just fine in .NET in UTF-16,

  • 0

I have flat files that I can load just fine in .NET in UTF-16, even though they are technically UCS2-LE (w/o BOM), and I understand this is because UCS-2 is an older standard that UTF-16 supercedes.

However, what I’m interested in is being able to determine if a file actually is UCS-2. I know that this means I’d be guessing. I have tried the .NET ports of chardet, the IMultilang2 interop, and some open source by Novell for trying to tease out a determination of UCS-2 over UTF-16 and I haven’t had any success. I haven’t found any technique that can determine the difference between UCS-2LE w/o BOM and invalid/overlong UTF-8.

Should I be inspecting them byte for byte and trying to decide if it’s variable or fixed length encoding? Maybe look for missing codepoints? The issue is these text files have no special codepoints, they only have the bog standard Western character set. But TextPad saves them as UCS2-LE w/o BOM, and it complicates downstream file operations in our software that wants them to be fully compliant UTF-16 (and just force loading the files works, but won’t work with the software’s requirements).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T05:53:44+00:00Added an answer on June 5, 2026 at 5:53 am

    This wikipedia article section, http://en.wikipedia.org/wiki/UTF-16,
    speaks about the Basic Multilingual Plane, BMP. All code points in the BMP is identical for both UTF-16 and UCS-2. If TextPad is just encoding the BMP then you can treat the document as either UTF-16 or UCS-2.

    It is when code points outside the BMP are encoded that a problem arises. UCS-2 cannot represent code points outside the BMP. http://en.wikipedia.org/wiki/Universal_Character_Set
    This would lead one to assume that if a code point is outside the BMP then it can be treated at UTF-16. This could be problematic if the program creating the file was doing UCS-2 improperly and using codepoints outside the BMP for ancillary reasons.

    Most libraries and programs that read UTF allow you to specify what to do when an encoding error occurs on a per character basis(raise an exception, replace with a placeholder, simply ignore). If an improper UCS-2 file is run through one of these as UTF-16 it will raise errors. Understanding what the author of the file was trying to do outside the BMP would be the only way to handle them appropriately.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a lot of flat (text) files that I want to import daily
I have a VB.Net program that reads in a flat file, and then parses
We have 200+ views in Oracle that should be transfomed to 200+ flat files
I have a for each loop that populates from a set of flat files
I have an application that receives files with a flat table in DBF, which
I have a flat file that I'm trying to scrub for import into a
I have a flat file with xml data that contains parent and child information
I have a windows mobile 5.0 application (smartphone) that contains a flat data file
I have a tab delimited file on a shared path. I've setup that flat
I have several hundred files in a non-flat directory structure. My Makefile lists each

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.