I need to sanitize article titles when (creative) users try to attract attention with

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T01:39:57+00:00 2026-05-14T01:39:57+00:00

I need to sanitize article titles when (creative) users try to attract attention with

0

I need to sanitize article titles when (creative) users try to “attract attention” with some non-alphanum repetition.

Exemples:

Buy my product !!!!!!!!!!!!!!!!!!!!!!!!
Buy my product !? !? !? !? !? !?
Buy my product !!!!!!!!!…….!!!!!!!!
Buy my product <———–

Some acceptable solution would be to reduce the repetition of non-alphanum to 2.

So I would get:

Buy my product !!
Buy my product !? !?
Buy my product !!..!!
Buy my product <–

This solution did not work that well:

preg_replace('/(\W{2,})(?=\1+)/', '', $title)

Any idea how to do it in PHP with regex?

Other better solution is also welcomed (I cannot strip all the non-alphanum characters as they can make sense).

Edit: the objective is only to avoid most common issues. The other creative cases will be sanitized manually or sanitized with an other regex.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T01:39:57+00:00

That’s really an inefficient problem to solve with a regex, especially if the repeated expression is arbitrarily large. Practically, it shold be enough to just cap the length of the repeated expression at something like 3 to 5, and it should be a lot easier.

Something like

$title = preg_replace('/(\W{1,5})(?=\1+)/', '', $title);

should work.

Some preliminary testing shows that

$title = 'Buy my product !!!!!!!!!!!!!!!!!!!!!!!! Buy my product !? !? !? !? !? !? Buy my product !!!!!!!!!.......!!!!!!!! Buy my product <-----------';

$title = preg_replace('/(\W{1,5})(?=\1{2,})/', '', $title);

echo $title;

will output

Buy my product !! Buy my product !? !? Buy my product !!..!! Buy my product <--

This appears to pass all your test cases.

Re: Gordon

Your string:

¸·´`·¸·´`·¸·´`·¸ Human ·-> creativity << is endless !¡!¡! ☻☺

doesn’t repeat anything but the first part more than two times. It seems to require:

$title = preg_replace('/(\W{1,9})(?=\1{2,})/', '', $title);

before it simplifies to

¸·´`·¸·´`·¸ Human ·-> creativity << is endless !¡!¡! ☻☺

(Which implies that preg_replace isn’t Unicode-aware – oh well)

you can also adjust it to repeat only once:

$title = preg_replace('/(\W{1,9})(?=\1+)/', '', $title);

in which case it becomes:

¸·´`·¸ Human ·-> creativity < is endless !¡! ☻☺

If your point is that it’s possible to create lots of “ASCII art” even if it’s required to repeat less than two times, well, that’s outside of the scope of this question. For the purposes of keeping ASCII art to a minimum, I would recommend simply using something like:

preg_replace('/(\W{5})\W+/', '$1', $title);

(i.e. just cap the number of non-alphanumeric characters that can be displayed in a row. Note that this would need to be adjusted for compatibility with languages with non-Latin alphabets, like Russian.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to sanitize article titles when (creative) users try to attract attention with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply