I’m doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR ), and it’s

Question

0

Asked: May 20, 20262026-05-20T23:41:04+00:00 2026-05-20T23:41:04+00:00

I’m doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR ), and it’s

0

I’m doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR), and it’s required that format strings (like %d-%m-%Y %H:%M) must be specified in Unicode, where each (in this case, ASCII) character is represented as <U00xx>.

So a text like this:

LC_TIME
d_t_fmt "%a %d %b %Y %T %Z"
d_fmt   "%d-%m-%Y"
t_fmt   "%T"

Must be:

LC_TIME
d_t_fmt "<U0025><U0061><U0020><U0025><U0064><U0020><U0025><U0062><U0020><U0025><U0059><U0020><U0025><U0054><U0020><U0025><U005A>"
d_fmt   "<U0025><U0064><U002D><U0025><U006D><U002D><U0025><U0059>"
t_fmt   "<U0025><U0054>"

Thus I need a command-line script (be it bash, Python, Perl, or something else) that would take an input like %d-%m-%Y and convert it to <U0025><U0064><U002D><U0025><U006D><U002D><U0025><U0059>.

All characters in the input string would be ASCII chars (from 0x20 to 0x7F), so this is actually a fancier “char-to-hex-string” conversion.

Could anyone please help me? My skills in bash scripting are very limited, and even worse in Python.

Bonus for elegant, explained solutions.

Thanks!

(by the way, this would be the “reverse” script for my previous question)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T23:41:04+00:00

Every char with file input

If you wanted to convert every character of a file to the unicode representation, then it would be this simple one-liner

while IFS= read -r -n1 c;do printf "<U%04X>" "'$c"; done < ./infile

Every char on STDIN

If you want to make a unix-like tool which converts input on STDIN to unicode-like output, then use this:

uni(){ c=$(cat); for((i=0;i<${#c};i++)); do printf "<U%04X>" "'${c:i:1}"; done; }

Proof of Concept

$ echo "abc" | uni
<U0061><U0062><U0063>

Only chars between double-quotes

#!/bin/bash

flag=0
while IFS= read -r -n1 c; do
    if [[ "$c" == '"' ]]; then
        ((flag^=1))
        printf "%c" "$c"
    elif [[ "$c" == $'\0' ]]; then
        echo
    elif ((flag)); then
        printf "<U%04X>" "'$c"
    else
        printf "%c" "$c"
    fi
done < /path/to/infile

Proof of Concept

$ cat ./unime
LC_TIME
d_t_fmt "%a %d %b %Y %T %Z"
d_fmt   "%d-%m-%Y"
t_fmt   "%T"
abday "Dom";"Seg";/
here is a string with "multiline
quotes";/

$ ./uni.sh
LC_TIME
d_t_fmt "<U0025><U0061><U0020><U0025><U0064><U0020><U0025><U0062><U0020><U0025><U0059><U0020><U0025><U0054><U0020><U0025><U005A>"
d_fmt   "<U0025><U0064><U002D><U0025><U006D><U002D><U0025><U0059>"
t_fmt   "<U0025><U0054>"
abday "<U0044><U006F><U006D>";"<U0053><U0065><U0067>";/
here is a string with "<U006D><U0075><U006C><U0074><U0069><U006C><U0069><U006E><U0065>
<U0071><U0075><U006F><U0074><U0065><U0073>";/

Explanation

Pretty simply really

while IFS= read -r -n1 c;: Iterate over the input one character at a time (via -n1) and store the char in the variable c. The IFS= and -r flags are there so that the read builtin doesn’t try to do word splitting or interpret escape sequences, respectively.
if [[ "$c" == '"' ]];: If the current char is a double-quote
((flag^=1)): Invert the value of flag from 0->1 or 1->0
elif [[ "$c" == $'\0' ]];: If the current char is a NUL, then echo a newline
elif ((flag)): If flag is 1, then perform unicode transliteration
printf "<U%04X>" "'$c": The magic that does the unicode transliteration. Note that the single-quote before the $c is mandatory as it tells printf that we are giving it the ASCII representation of a number.
else printf "%c" "$c": Print out the character with no unicode transliteration performed

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR ), and it’s

Leave an answerCancel reply

1 Answer

Every char with file input

Every char on STDIN

Proof of Concept

Only chars between double-quotes

Proof of Concept

Explanation

Leave an answer
Cancel reply