I am just writing some code that will analyze a users name. I have so far written the code to detect if the users name contains any bad words, punctuation, symbols, numbers or repeating characters, but managed to get myself in a pickle over detection of unusual capitalization.
So far i have put in place a very simple way or doing this in the form of; If the name has any more than two capital letters we reject the name. (I used two as some people have double-barreled names, e.g. Anne-marie).
$capitals = strlen($name) - strlen(preg_replace('/[A-Z]/', '', $name));
if ($capitals > 2) {
$hasError = true;
}
Although it gives us a half-baked solution to the unusual capitalization issue. The problem is that it is, Half-baked! It still allows many possible patters of upper and lowercase letters in a name, just not if they exceed two characters.
So, i have added in an extra piece of code in the if statement to check for repeating upper-case characters, thus making the above code now this:
$capitals = strlen($name) - strlen(preg_replace('/[A-Z]/', '', $name));
if (preg_match('/[A-Z]{2,}/', $name) || $capitals > 2) {
$hasError = true;
}
Now this seems to have solved 70% of this issue. Users can now no longer use patterns like these. XXxxx, XxXxX, xxxXX, xxx-XXxx. But there is still problems i need to address, as long as they use less than two chars, and don’t group capital-letters they can still create their “cool” looking names. So, if a user was to input a name styled like JeSse it would be accepted.
So my question is how would i go about the last step of this issue? I need to only allow users to have their first names in the formats Jesse, Jesse-James, Jesse James.
How can i make sure only the first letter of their name is capitalized, even if their name is double barreled?
you see in python i would use .find() and just detect the first letter of the first word and make sure it’s upper-case, count on until we meet a space or hyphen, then make sure the next letter after the space or hyphen is also upper-case? But i have no clue of how to do this with regex in PHP.
Would this be the right way to do it with regex? if it is, how would i go about this? or does PHP have a secret .find() function i can use in a similar way to pythons? and if it does, would it be more appropriate to go down that route?
Sorry i went into so much detail, seems to be to many numptys putting up questions like “I have a regex issue, i need to detect patterns” then expect back an answer that will be useful to them”. I wanted to provide enough information for it to be useful to people in the future landing on this page.
Many thanks for all future replies.
Jamie
P.S. Just out of interest, does anyone also know how to use non-English characters in PHP. Would i need to create a string of the characters i wish to detect, or does php have a ‘code’ for each character like html entities?
You can probably do that all in one regex:
Which would allow for
JesseorJesse-JamesorJesse.JamesorJesse Jamesonly. (Remove dot and\space if you don’t want those.)If you want to allow the second part to start with lowercase, or ensure at least two lowercase letters follow each other use
{2,}in place of+:For unicodeness use
\p{Lu}for uppercase letters.And
\p{Ll}for lowercase ones:You might wish to add another optional for allowing
Jesse-J.-Jamesfor example, thus having a single letter abbreviation with:Though that might need to be repeated at the start and middle part.