I have a large text file that contains a few unicode characters that make

Question

0

Asked: June 15, 20262026-06-15T05:31:41+00:00 2026-06-15T05:31:41+00:00

I have a large text file that contains a few unicode characters that make

0

I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T05:31:42+00:00

Try:

nonascii() { LANG=C grep --color=always '[^ -~]\+'; }

Which can be used like:

printf 'ŨTF8\n' | nonascii

Within [] ^ means “not”. So [^ -~] means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^\x00-\x7f] below. The \+ means 1 or more and will get multibye characters to have a color shown around the complete character(s), rather than interspersed in each byte, thus corrupting the multibyte sequence

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large text file that contains a few unicode characters that make

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply