I have below readfile() java function to read .htm files
private String readfile(String inputDoc) throws IOException {
FileInputStream fis = null;
InputStreamReader isr = null;
String text = null;
//open input stream to file
fis = new FileInputStream(inputDoc);
isr = new InputStreamReader(fis, "UTF-8");
StringBuffer buffer = new StringBuffer();
int c;
while( (c = isr.read()) != -1 ) {
buffer.append((char)c);
}
text = buffer.toString();
isr.close();
return text;
}
Here is example snippet of input doc
<?xml version="1.0" encoding="utf-8"?><html>
<head>
For some reason text string returned from readfile() is <?xml version="1.0" encoding="utf-8"?><html>\r\r\n<head>
but I expect it to be <?xml version="1.0" encoding="utf-8"?><html>\r\n<head>
as it is outlined here newline char in windows \r\n
I ran above function in IntelliJ Idea on Windows 7. (IDEA default encoding is set to UTF-8)
Does anyone know why I get this weird result from readfile(String inputDoc) function for newline
You get this because it is like this in the input file. Try to open the input file in a hex editor to verify.