I’m trying to remove diacritic characters from a pangram in Polish. I’m using code

Question

0

Asked: May 16, 20262026-05-16T14:54:43+00:00 2026-05-16T14:54:43+00:00

I’m trying to remove diacritic characters from a pangram in Polish. I’m using code

0

I’m trying to remove diacritic characters from a pangram in Polish. I’m using code from Michael Kaplan’s blog http://www.siao2.com/2007/05/14/2629747.aspx, however, with no success.

Consider following pangram: “Pchnąć w tę łódź jeża lub ośm skrzyń fig.”. Everything works fine but for letter “ł”, I still get “ł”. I guess the problem is that “ł” is represented as single unicode character and there is no following NonSpacingMark.

Do you have any idea how I can fix it (without relying on custom mapping in some dictionary – I’m looking for some kind of unicode conversion)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T14:54:43+00:00

The approach taken in the article is to remove Mark, Nonspacing characters. Since as you correctly point out “ł” is not composed of two characters (one of which is Mark, Nonspacing) the behavior you see is expected.

I don’t think that the structure of Unicode allows you to accomplish a fully automated remapping (the author of the article you reference reaches the same conclusion).

If you’re just interested in Polish characters, at least the mapping is small and well-defined (see e.g. the bottom of http://www.biega.com/special-char.html). For the general case, I do no think an automated solution exists for characters that are not composed of a standard character plus a Mark, Nonspacing character.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to remove diacritic characters from a pangram in Polish. I’m using code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply