Im doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR), to change the

Question

0

Asked: May 20, 20262026-05-20T23:31:04+00:00 2026-05-20T23:31:04+00:00

Im doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR), to change the

0

Im doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR), to change the default format of dates, time, numbers, etc. But since unicode chars are presented as strings in the <U9999> format, text is very hard to read.

Here is a snippet of it:

LC_TIME
abday   "<U0044><U006F><U006D>";"<U0053><U0065><U0067>";/
    "<U0054><U0065><U0072>";"<U0051><U0075><U0061>";/
    "<U0051><U0075><U0069>";"<U0053><U0065><U0078>";/
    "<U0053><U00E1><U0062>"

So, how to make a simple script (may be bash, python, pearl, whatever) to convert this text replacing the <Uxxxx> codes to their ASCII equivalents? (yes, they are all ASCI chars below 255, most even below 127)

If several answers are received, Ill accept the most elegant and/or the more detailed explained one (like options and flags used in comands)

As an example, the above text would be converted to:

LC_TIME
abday   "Dom";"Seg";/
    "Ter";"Qua";/
    "Qui";"Sex";/
    "Sáb"

Bonus points for another script that could do the opposite: convert all chars of a given string to <Uxxx> format.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T23:31:05+00:00

Using Fields

#!/bin/bash

awk -F'<U0+|>' '{
    for(i=1;i<=NF;i++)
        if($i ~ "^[0-9A-F]+$")
            $i=sprintf("%c", strtonum("0x"$i))
}1' OFS="" /path/to/infile

Explanation

-F'<U0+|>': This is the magic that makes this script so short. We tell awk that the field separator is either <U0+ or a simple >. The benefit of doing this is that awk will auto-strip these characters for us so we don’t have to do it manually with gsub() when it comes time to do the strtonum() conversion.
for(i=1;i<=NF;i++): iterate over each field
if($i ~ "^[0-9A-F]+$"): check if the current field is only composed of hex digits. Remember that due to #1 above something like <U006F> will be seen as 6F at this point
$i=sprintf("%c", strtonum("0x"$i)): replace the hex digit with its corresponding ascii value. We must prefix the field $i with "0x" so awk knows its a hex value
}1: shortcut for a mandatory print or always print each line
OFS="": set the Output Field Separator to the null string. If we don’t do this, we will get spaces in the output everywhere there was a <U0+ or >

Using match() [requires gawk]

#!/bin/bash

gawk '{
    while(match($0, /<U[0-9A-F]+>/)){
        pat = substr($0,RSTART,RLENGTH)
        gsub(/U0+|[<>]/,"",pat)
        asc = sprintf("%c", strtonum("0x"pat))
        $0 = substr($0, 1, RSTART-1) asc substr($0, RSTART+RLENGTH)
    }
}1' /path/to/infile

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Im doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR), to change the

Leave an answerCancel reply

1 Answer

Using Fields

Explanation

Using match() [requires gawk]

Leave an answer
Cancel reply