I have a process that reads data using the Windows 1252 codepage (input from a SQL Server 2008 varchar field). I then write this data to a flat text file, which is picked up by an IBM mainframe system that uses the EBCDIC 37 codepage. That system converts the file to its own character set. However, some things in the extended ASCII range (char codes 128 – 255) don’t get converted nicely by the mainframe. I think this is because certain characters in the Windows character set do not exist in the EBCDIC character set.
Is there a general way to determine what characters I need to filter out, such as a left single quote, right single quote, left double quote, right double quote, bullet, en dash, and em dash, (Windows codes 145 – 151, respectively), to name a few? If so, is there some algorithm I can use to determine what the closest EBCDIC equivalent might be (such as a normal single quote for either a left single quote or a right single quote)?
I was looking for a general way to solve this problem instead of focusing on just EBCDIC 37, and I didn’t want to visually compare two charts of codes. I wrote a short program (in VB.NET) to find all of the characters that exist in one codepage and not the other.
For the case of Windows 1252 to EBCDIC 37, there are 27 characters that do not map. I chose what I thought was the best equivalent for those characters.