My proprietary text encoding uses all 256 byte values with the lower 128 being mostly the same as ascii (the important stuff i.e. control characters, spaces, newlines are all exactly the same). I want to be able to read this file as bytes in C# .NET and still maintain the ability to read it line by line and do regex searches on it. What is the best way to do this in C# .NET?
I realize that if my encoding only used the first 128 byte values this would be simple. I just don’t want the higher characters to get accidentally converted to unicode values.
It sounds like you just want to implement your own subclass of
Encoding. It’s reasonably straightforward to do this, and then you can pass it to theStreamReaderconstructor (orFile.OpenTextetc.If you look at the code I wrote (many years ago) to handle EBCDIC, you should be able to use that as a reasonable starting point.
The overlap with ASCII seems pretty much irrelevant to this, by the way.
Any time you convert any binary data into text, you’re converting to Unicode values. That’s how text in .NET is defined.