I like using Python, because of the easy-to-learn syntax, however, I recently learned it has no support for UTF-8 in the area of CSVs. As I often use CSVs, this seems a serious problem for me. Is there another scripting language that has a simple syntax that I can learn when I need to manage really large CSV UTF-8 files?
I like using Python, because of the easy-to-learn syntax, however, I recently learned it
Share
I’d recommend using
gawk. E.g.:would print
FILE.CSV‘s first two (;separated) column. To work properly with UTF-8, you should use it like:LC_ALL=C awk ‘BEGIN {print length(“árvíztűrőtükörkúrópék”)}’
=> 30
LC_ALL=en_US.utf8 awk ‘BEGIN {print length(“árvíztűrőtükörkúrópék”)}’
=> 21
(Or you can set LC_ALL globally if you’re using UTF-8 all the time, and you’re on *nix, e.g. in
.bashrc,export LC_ALL=en_US.utf8.)awkis an old, but really powerful and fast tool.HTH