I have a text file containing Arabic characters and some other characters (punctuation marks,

Question

0

Asked: June 10, 20262026-06-10T23:54:36+00:00 2026-06-10T23:54:36+00:00

I have a text file containing Arabic characters and some other characters (punctuation marks,

0

I have a text file containing Arabic characters and some other characters (punctuation marks, numbers, English characters, … ).
How can I tell sed to remove all the characters in the file, except Arabic ones? In short I can say that we typically tell sed to remove/replace some specific characters and print others, but now I am looking for a way to tell sed just print my desired characters, and remove all other characters.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T23:54:37+00:00

With GNU sed, you should be able to specify characters by their hex code. You can use those in a a character class:

sed 's/[\x00-\x7F]//g' # hex notation
sed 's/[\o000-\o177]//g' # octal notation

You should also be able to achieve the same effect with the tr command:

tr -d '[\000-\177]'

Both methods assume UTF8 encoding of your input file. Multi-byte characters have their highest bit set, so you can simply strip everything that’s a standard ASCII (7 bits) character.

To keep everything except some well defined characters, use a negative character classe:

sed 's/[^characters you want to keep]//g'

Using a pattern alike to [^…]\+ might improve performance of the regex.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text file containing Arabic characters and some other characters (punctuation marks,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply