I get files in different formats coming from different systems that I need to import into our database. Part of the import process it to check the line length to make sure the format is correct.
We seem to be having issues with files coming from UNIX systems where one character is added. I suspect this is due to the return carriage being encoded differently on UNIX and windows platform.
Is there a way to detect on which file system a file was created, other than checking the last character on the line? Or maybe a way of reading the files as text and not binary which I suspect is the issue?
Thanks Guys !
Unix systems use
\nline endings while windows uses\r\nand mac uses\r.You cannot detect the file system since it doesn’t matter at all. I can use \n on windows if my editor supports it for example. It’s just the standard on those OS, not a requirement.
The proper way – assuming you don’t have a function which properly tokenizes no matter what line ending the file uses – is to search for a \n OR a \r and then end the current line and strip all chars from the remaining data which are either \r or \n before you begin the next line.
However, this will cause issues if you have blank lines and need to keep them. In this case you have to look at linebreaks more carefully: