I’m using libcurl to fetch some HTML pages. The HTML pages contain some character

Question

0

Editorial Team

Asked: May 17, 20262026-05-17T17:56:24+00:00 2026-05-17T17:56:24+00:00

I’m using libcurl to fetch some HTML pages. The HTML pages contain some character

0

I’m using libcurl to fetch some HTML pages.

The HTML pages contain some character references like: סלקום

When I read this using libxml2 I’m getting: ׳₪׳¨׳˜׳ ׳¨

is it the ISO-8859-1 encoding?

If so, how do I convert it to UTF-8 to get the correct word.

Thanks

EDIT: I got the solution, MSalters was right, libxml2 does use UTF-8.

I added this to eclipse.ini

-Dfile.encoding=utf-8

and finally I got Hebrew characters on my Eclipse console.
Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T17:56:24+00:00

Have you seen the libxml2 page on i18n ? It explains how libxml2 solves these problems.

You will get a ס from libxml2. However, you said that you get something like ׳₪׳¨׳˜׳ ׳¨. Why do you think that you got that? You get an XMLchar*. How did you convert that pointer into the string above? Did you perhaps use a debugger? Does that debugger know how to render a XMLchar* ? My bet is that the XMLchar* is correct, but you used a debugger that cannot render the Unicode in a XMLchar*

To answer your last question, a XMLchar* is already UTF-8 and needs no further conversion.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using libcurl to fetch some HTML pages. The HTML pages contain some character

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply