I often need to write simple BASH scripts on my computer for manipulating files. BASH seems to have difficulty working with UTF-8 content.
- Are there any versions of BASH which are fully UTF-8 compatible?
- Is there a replacement for BASH, which uses a similar or identical syntax, but is UTF-8 compatible?
I take your problem is the usual sed/awk/grep… etc doesn’t support unicode, so stackoverflow solutions usually don’t work for you?
bash itself is very limited without external programs.
To do what you want, you probably have to code in a more functional programming language other than bash.
UTF-8 itself is not very suitable for processing, you need to parse it into 2-byte or 4 byte character and then process the characters. (i.e. conversion to UTF-16 or UTF-32) and then convert it back to UTF-8 for storage.