Are there any scripts, libraries, or programs using Python, or BASH tools (e.g. awk, perl, sed) which can correctly convert numbered pinyin (e.g. dian4 nao3) to UTF-8 pinyin with tone marks (e.g. diàn nǎo)?
I have found the following examples, but they require PHP or C#:
- [PHP] Convert numbered to accentuated Pinyin?
- [C#] Any libraries to convert number Pinyin to Pinyin with tone markings?
I have also found various online tools, but they cannot handle a large number of conversions.
I’ve got some Python 3 code that does this, and it’s small enough to just put directly in the answer here.
This handles
ü,u:, andv, all of which I’ve encountered. Minor modifications will be needed for Python 2 compatibility.