I have a htm file can i read it as UTF-8 formatted file without doing anything to the file.
the file is saved in unicode(not sure) i want to read it as UTF-8 file other wise it gives me some boxes.. this has to be done using java
FileReader loInput = new FileReader(loFile);
BufferedReader loBufferReader = new BufferedReader(loInput);
String loLine; // String that holds current loFile loLine
int loCount = 0; // Line number of loCount
loLine = loBufferReader.readLine();
loCount++;
while (loLine != null) {
loContent = loContent.concat(loLine);
loLine = loBufferReader.readLine();
loCount++;
}
loBufferReader.close();
i tried this
EDIT: i have to get the data from HTML file and convert it into a DOM object for further processing
I am using
SAXBuilder loSaxBuilder=new SAXBuilder();
Reader loStringReader=new StringReader(loContent);
Document loDoc=loSaxBuilder.build(loStringReader);
XPath loXpath = XPath.newInstance("/Div");
Element loElement = (Element) loXpath.selectSingleNode(loDoc);
to convert it to dom object
Done using :
private static boolean isFiveBytesSequence(byte b)
{
return -8 <= b && b <= -5;
}
and by calling this