I went as far as searching C sources, but I can’t find this function, and I really don’t want to write one myself because it absolutely must be there.
To elaborate: Unicode points are represented as U+######## – this is easy to get, what I need, is the format the character is written to a file (for example). A Unicode codepoint translates to bytes such that 7 bits of the rightmost byte are written to the first byte, then 6 bits of the next bits are written into the next byte and so on. Emacs certainly knows how to do it, but there’s no way I can find to get the byte sequence of UTF-8 encoded string from it as a sequence of bytes (each containing 8 bits).
Functions such as get-byte or multybite-char-to-unibyte work only with characters that can be represented using no more then 8 bits. I need the same thing what get-byte does, but for multibyte characters, so that instead of an integer 0..256 I’d receive either a vector of integers 0..256 or a single long integer 0..2^32.
EDIT
Just in case anyone will need this later:
(defun haxe-string-to-x-string (s)
(with-output-to-string
(let (current parts)
(dotimes (i (length s))
(if (> 0 (multibyte-char-to-unibyte (aref s i)))
(progn
(setq current (encode-coding-string
(char-to-string (aref s i)) 'utf-8))
(dotimes (j (length current))
(princ (format "\\x%02x" (aref current j)))))
(princ (format "\\x%02x" (aref s i))))))))
encode-coding-stringmight be what you’re looking for:It returns a string, but you can access the individual bytes with
aref:or if you don’t mind using
clfunctions,concatenateis your friend: