I’m working with a SSIS package that takes data from SQL Server and creates text files to ship to a vendor. Currently the files are being encoded using ANSI 1252 and the Unicode checkbox is not checked on the Flat File Connection Manager.
The package failed when it encountered this symbol: ♥
This led me to believe that if the step attempted to write out any non-ascii character, it would fail. However, it will succesfully handle: “ş” by converting it to a standard “s”. For our purposes, this behavior is great, and if it did something similar to the heart symbol, there would be no issue. I’m trying to avoid sending a Unicode file, as the file is already very large and doubling its size is not preferable.
What I’m looking for is the range of unicode characters that SSIS will not automatically convert for me. Then what I’ll need to do is a replace on the original SQL statement, to clear out those characters like the ♥.
We started with REPLACE(NAME, SUBSTRING(NAME, PATINDEX('%[^ -ÿ]%', NAME COLLATE Latin1_General_BIN2), 1), ''), but this will replace the “ş” with a space, which we are attempting to avoid since SSIS handles the “ş” just fine.
Thanks for reading this question!
You’re getting Windows’s “best-fit fallback” encoding. Exactly which characters it converts are not officially documented, and the behaviour differs depending on the locale. Many of the replacements are inappropriate in many cases, and there can even be security problems. It is almost always best avoided. Background
UTF-16LE (what Microsoft tools call “Unicode”) may be twice the size of ASCII, but why not another UTF, most obviously UTF-8?