I went as far as searching C sources, but I can’t find this function,

Question

0

Asked: June 6, 20262026-06-06T09:14:33+00:00 2026-06-06T09:14:33+00:00

I went as far as searching C sources, but I can’t find this function,

0

I went as far as searching C sources, but I can’t find this function, and I really don’t want to write one myself because it absolutely must be there.

To elaborate: Unicode points are represented as U+######## – this is easy to get, what I need, is the format the character is written to a file (for example). A Unicode codepoint translates to bytes such that 7 bits of the rightmost byte are written to the first byte, then 6 bits of the next bits are written into the next byte and so on. Emacs certainly knows how to do it, but there’s no way I can find to get the byte sequence of UTF-8 encoded string from it as a sequence of bytes (each containing 8 bits).

Functions such as get-byte or multybite-char-to-unibyte work only with characters that can be represented using no more then 8 bits. I need the same thing what get-byte does, but for multibyte characters, so that instead of an integer 0..256 I’d receive either a vector of integers 0..256 or a single long integer 0..2^32.

EDIT

Just in case anyone will need this later:

(defun haxe-string-to-x-string (s)
  (with-output-to-string
    (let (current parts)
      (dotimes (i (length s))
        (if (> 0 (multibyte-char-to-unibyte (aref s i)))
            (progn
              (setq current (encode-coding-string
                             (char-to-string (aref s i)) 'utf-8))
              (dotimes (j (length current))
                (princ (format "\\x%02x" (aref current j)))))
          (princ (format "\\x%02x" (aref s i))))))))

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T09:14:34+00:00

encode-coding-string might be what you’re looking for:

*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8)
"e\304\245o\305\235an\304\235o \304\211iu\304\265a\305\255de"

It returns a string, but you can access the individual bytes with aref:

ELISP> (aref (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8) 1)
196
ELISP> (format "%o" 196)
"304"

or if you don’t mind using cl functions, concatenate is your friend:

ELISP> (concatenate 'list (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8))
(101 196 165 111 197 157 97 110 196 157 111 32 196 137 105 117 196 181 97 197 173 100 101)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I went as far as searching C sources, but I can’t find this function,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply