I tried to phrase this as a generic question but realized I don’t know enough, so here is the problem I’m having.
Here is a snippet from a console application:
public void Run()
{
Run(Console.Out);
}
public void Run(TextWriter writer)
{
DataTable customers = _quickBooksAdapter.GetTableData("Customer");
customers.WriteXml(writer);
}
Then I run it from the console and use “>” to put it in a file.
c:\> QuickBooksETL extract US > qb_us.xml
If i try to load the result as I would normally:
var x = XDocument.Load("qb_us.xml");
I get the error:
Invalid character in the given encoding. Line 8, position 26.
So I tried to determine what .NET “thinks” it is using:
string path = @"\\ad1\accounting$\Xml\qb_us.xml";
StreamReader sr = new StreamReader(path);
sr.CurrentEncoding.Dump();
Result:
System.Text.UTF8Encoding
BodyName utf-8
EncodingName Unicode (UTF-8)
HeaderName utf-8
WebName utf-8
WindowsCodePage 1200
IsBrowserDisplay True
IsBrowserSave True
IsMailNewsDisplay True
IsMailNewsSave True
IsSingleByte False
EncoderFallback 5EncoderReplacementFallback
System.Text.EncoderReplacementFallback
DefaultString �
MaxCharCount 1
DecoderFallback 5DecoderReplacementFallback
System.Text.DecoderReplacementFallback
DefaultString �
MaxCharCount 1
IsReadOnly True
CodePage 65001
Finally, I find by guessing that it works if I just explicitly say it’s ASCII:
string path = @"\\ad1\accounting$\Xml\qb_us.xml";
StreamReader sr = new StreamReader(path, Encoding.ASCII);
var x = XDocument.Load(sr);
Any thoughts on where am I going wrong would be greatly appreciated. I admit I have never taken the “deep dive” on character encodings, but I’m willing to put in the effort to get this right.
The simple answer is not to get the console involved. Write directly to the file from your code:
or create the
TextWriterorStreamyourself and pass that in, e.g.Note that by reading it as ASCII, you’ll basically be getting question marks for any non-ASCII character in the original data. IIRC, that’s the default behaviour of an encoding when it encounters binary data it can’t handle.
Using a
Streamit should default to writing out in UTF-8, and the XML declaration and the data within the file should match.