Possible Duplicate:
Parsing an UTF-8 Encodded XML file
I am parsing a UTF-8 Encoded XML file which contains some arabic characters everything else is working properly except that the arabic characters are not displayed, some weird characters are displayed as below:
ÙØ±ÙÙ
here is the link to the XML “http://212.12.165.44:7201/UniNews121.xml” file am Parsing
below is the code
public String getXmlFromUrl(String url) {
try {
return new AsyncTask<String, Void, String>() {
@Override
protected String doInBackground(String... params) {
//String xml = null;
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
httpClient.getParams().setParameter(CoreProtocolPNames.HTTP_CONTENT_CHARSET,"UTF-8");
HttpGet httpPost = new HttpGet(params[0]);
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");
} catch (Exception e) {
e.printStackTrace();
}
//just to remove the BOM Element
xml=xml.substring(3);
//Here am printing the xml and the arabic chars are malformed
Log.i("DEMO", xml);
return xml;
}
}.execute(url).get();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return xml;
}
Kindly note that no errors are occurring and everything is working properly just the arabic chars are malformed.
I would appreciate your help but please be specific in your answers
this
doesn’t do what you want.
EntityUtils.toString()uses the default charset, then you call getBytes(), which uses platform encoding as well when no encoding is specified, then you call new String, which tries to read this byte[] as an UTF-8 string byte[].You simply need to call