I’m doing some changes in Linux locale files /usr/share/i18n/locales (like pt_BR), and it’s required that format strings (like %d-%m-%Y %H:%M) must be specified in Unicode, where each (in this case, ASCII) character is represented as <U00xx>.
So a text like this:
LC_TIME
d_t_fmt "%a %d %b %Y %T %Z"
d_fmt "%d-%m-%Y"
t_fmt "%T"
Must be:
LC_TIME
d_t_fmt "<U0025><U0061><U0020><U0025><U0064><U0020><U0025><U0062><U0020><U0025><U0059><U0020><U0025><U0054><U0020><U0025><U005A>"
d_fmt "<U0025><U0064><U002D><U0025><U006D><U002D><U0025><U0059>"
t_fmt "<U0025><U0054>"
Thus I need a command-line script (be it bash, Python, Perl, or something else) that would take an input like %d-%m-%Y and convert it to <U0025><U0064><U002D><U0025><U006D><U002D><U0025><U0059>.
All characters in the input string would be ASCII chars (from 0x20 to 0x7F), so this is actually a fancier “char-to-hex-string” conversion.
Could anyone please help me? My skills in bash scripting are very limited, and even worse in Python.
Bonus for elegant, explained solutions.
Thanks!
(by the way, this would be the “reverse” script for my previous question)
Every char with file input
If you wanted to convert every character of a file to the unicode representation, then it would be this simple one-liner
Every char on STDIN
If you want to make a unix-like tool which converts input on STDIN to unicode-like output, then use this:
Proof of Concept
Only chars between double-quotes
Proof of Concept
Explanation
Pretty simply really
while IFS= read -r -n1 c;: Iterate over the input one character at a time (via-n1) and store the char in the variablec. TheIFS=and-rflags are there so that thereadbuiltin doesn’t try to do word splitting or interpret escape sequences, respectively.if [[ "$c" == '"' ]];: If the current char is a double-quote((flag^=1)): Invert the value of flag from 0->1 or 1->0elif [[ "$c" == $'\0' ]];: If the current char is a NUL, thenechoa newlineelif ((flag)): If flag is 1, then perform unicode transliterationprintf "<U%04X>" "'$c": The magic that does the unicode transliteration. Note that the single-quote before the$cis mandatory as it tellsprintfthat we are giving it the ASCII representation of a number.else printf "%c" "$c": Print out the character with no unicode transliteration performed