I’m developing a Portuguese software, so many of my entities have names like ‘maça’

Question

0

Asked: May 20, 20262026-05-20T16:46:50+00:00 2026-05-20T16:46:50+00:00

I’m developing a Portuguese software, so many of my entities have names like ‘maça’

0

I’m developing a Portuguese software, so many of my entities have names like ‘maça’ or ‘lição’ and I want to use the entity as a resource key. So I want keep every character except the ‘ç,ã,õ….’

There is some optimum solution using regex? My actual regex is (as Remove characters using Regex suggest):

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

only to emphasize, I’m worried just with Latin characters.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T16:46:50+00:00

A simple option is to white-list the accepted characters:

string clean = Regex.Replace(messy, @"[^a-zA-Z0-9!@#]+", "");

If you want to remove all non-ASCII letters but keep all other characters, you can use character class subtraction:

string clean = Regex.Replace(messy, @"[\p{L}-[a-zA-Z]]+", "");

It can also be written as the more standard and complicated [^\P{L}a-zA-Z]+ (or \W), which reads "select all characters that are not word letters or ASCII letters", which ends up with the letters we’re looking for.
Just some context for \W: It stands for "not a word character", meaning anything other than a-z,A-Z,0-9 and underscore _

You may also consider the following approach more useful: How do I remove diacritics (accents) from a string in .NET?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m developing a Portuguese software, so many of my entities have names like ‘maça’

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply