I’m using libcurl to fetch some HTML pages.
The HTML pages contain some character references like: סלקום
When I read this using libxml2 I’m getting: ׳₪׳¨׳˜׳ ׳¨
is it the ISO-8859-1 encoding?
If so, how do I convert it to UTF-8 to get the correct word.
Thanks
EDIT: I got the solution, MSalters was right, libxml2 does use UTF-8.
I added this to eclipse.ini
-Dfile.encoding=utf-8
and finally I got Hebrew characters on my Eclipse console.
Thanks
Have you seen the libxml2 page on i18n ? It explains how libxml2 solves these problems.
You will get a
סfrom libxml2. However, you said that you get something like׳₪׳¨׳˜׳ ׳¨. Why do you think that you got that? You get anXMLchar*. How did you convert that pointer into the string above? Did you perhaps use a debugger? Does that debugger know how to render aXMLchar*? My bet is that theXMLchar*is correct, but you used a debugger that cannot render the Unicode in aXMLchar*To answer your last question, a
XMLchar*is already UTF-8 and needs no further conversion.