I’m having a problem regarding a json string, i acquire with the Apache http client, containing german umlauts.
The mapping of json strings is only working, if the string does not contain any german umlaut, otherwise i get an “JsonMappingException: Can not deserialize instance of […] out of START_ARRAY.
The Apache http client is set with “Accept-Charset” to HTTP.UTF-8, but as result i always get e.g. “\u00fc” instead “ü”. When i manually replace e.g. “\u00fc” with “ü” the mapping works perfect.
How can i get a utf-8 encoded json response from Apache http client?
Or is the server output the problem?
params.setParameter(HttpProtocolParams.USE_EXPECT_CONTINUE, false);
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);
httpclient = new DefaultHttpClient(params);
httpclient = new DefaultHttpClient(params);
HttpGet httpGetContentLoad = new HttpGet(url);
httpGetContentLoad.setHeader("Accept-Charset", "utf-8");
httpGetContentLoad.setParams(params);
response = httpclient.execute(httpGetContentLoad);
entity = response.getEntity();
String loadedContent = null;
if (entity != null)
{
loadedContent = EntityUtils.toString(entity, HTTP.UTF_8);
entity.consumeContent();
}
if (HttpStatus.SC_OK != response.getStatusLine().getStatusCode())
{
throw new Exception("Loading content failed");
}
closeConnection();
return loadedContent;
And the json code is mapped here:
String jsonMetaData = loadGetRequestContent(getLatestEditionUrl(newspaperEdition));
Newspaper loadedNewspaper = mapper.readValue(jsonMetaData, Newspaper.class);
loadedNewspaper.setEdition(newspaperEdition);
Update 1:
JsonMetaData is type of String containing the fetched json code.
Update2:
This code i use to transform the json output to me needs:
public static String convertJsonLatestEditionMeta(String jsonCode)
{
jsonCode = jsonCode.replaceFirst("\\[\"[A-Za-z0-9-[:blank:]]+\",\\{", "{\"edition\":\"an-a1\",");
jsonCode = jsonCode.replaceFirst("\"pages\":\\{", "\"pages\":\\[");
jsonCode = Helper.replaceLast(jsonCode, "}}}]", "}]}");
jsonCode = jsonCode.replaceAll("\"[\\d]*\"\\:\\{\"", "\\{\"");
return jsonCode;
}
Update3:
Json conversion example:
jsoncode before conversion:
["Newspaper title",
{
"date":"20130103",
"pages":
{
"1": {"ressort":"ressorttitle1","pdfpfad":"pathToPdf1","number":1,"size":281506},
"2":{"ressort":"ressorttitle2","pdfpfad":"pathToPdf2","number":2,"size":281533},
[...]
}
}
]
Jsoncode after conversion:
{
"edition":"Newspaper title",
"date":"20130103",
"pages":
[
{"ressort":"Resorttitle1","pdfpfad":"pathToPdf1","number":1,"size":281506},
{"ressort":"Resorttitle2","pdfpfad":"pathToPdf2","number":2,"size":281533},
[...]
]
}
Solution:
I started using GSON as @Boris suggested and the problem regarding umlauts is gone! Further more GSON really seems to be faster than Jackson Json.
A workaround would be to replace the characters manually following this table:
Sign Unicode representation
Ä, ä \u00c4, \u00e4
Ö, ö \u00d6, \u00f6
Ü, ü \u00dc, \u00fc
ß \u00df
€ \u20ac
Try parsing like that:
No reason to go through
String, Jackson parsesInputStreams directly. Also Jackson will automatically detect the encoding if you use my proposed approach.EDIT By the way consider using GSON JSON parsing library. It is even faster than Jackson and easier to use. However, Jackson recently started parsing XMl, too, which is a virtue.
EDIT2 After all you have added as details I would suppose the problem is with the server implementation of the services – the umlauts are not to be unicode escaped in the json – UTF 8 is native encoding for it. Why don’t you instead of
manually replace e.g. "\u00fc" with "ü"do it via regex?