I am generating CSV files. Occasionally the data source will pass along characters with accents etc… that I would like to strip out. Is there a reasonably straightforward way to detect and strip out UTF-8 characters?
I am generating CSV files. Occasionally the data source will pass along characters with
Share
If you’re sure you’re getting UTF-8 as input, use iconv to convert the values to the encoding you’re using in your output – detecting UTF-8 chars isn’t failsafe (as the values are valid iso-8859-1 characters as well (or all 8 bit encodings, really).
If you just want to use the regular ascii set of values (byte-values 0 – 127), you can let iconv convert to the ‘ascii’ encoding and transliterate:
will result in
being returned.