I ran into some trouble while creating a C-Extension for ruby that got me thinking. I wonder how Ruby (1.9.1) handles strings (and all the encoding-stuff) internally?
If I have a string like "o", and I pass the string to a C-Function (as VALUE), I can deal with it pretty easily using the RSTRING_PTR() and the RSTRING_LEN() macro. However, if I make the string ö (a german umlaut character), RSTRING_LEN() will give me 2.
I’m a bit stumped on the contents of RSTRING_PTR() in that case, the two bytes are 0xA4 and 0xC3. What encoding is this? I tried using "ö".force_encoding( ... ) with different encodings before passing the string to the C-function, but that does not affect the contents of RSTRING_PTR at all.
What I need is a way to have the string represented as a WCHAR* encoded in UTF-16 (in the case of "ö", that would be 0x00F6) in my C-function, but that’s kinda hard to do if you do not know what encoding you’re coming from…
thx for any help in advance
String internals in ruby 1.9 depends on
__ENCODING__constant andEncoding.default_internalsetting.In your case it looks like UTF-8 (default), but
öis actuallyc3 b6in UTF-8, andc3 a4isä