Assuming I have a string “HET1200 text string” and I need it to change to “HET1200 Text String”. Encoding would be UTF-8.
How can I do that? Currently, I use mb_convert_case($string, MB_CASE_TITLE, "UTF-8"); but that changes “HET1200” to “Het1200.
I could specify an exception, but it won’t be an exhaustive. So I rather all uppercase words to remain uppercase.
Thanks 🙂
OK, let’s try to recreate
mb_convert_caseas close as possible but only changing the first character of every word.The relevant part of
mb_convert_caseimplementation is this:Basically, this does the following:
modeto0.modewill determine whether we are in the first character of a word. If it’s0, we are, otherwise, we’re not.resto1if it’s a word character. More specifically, set it to1if it has the property “Mark, Non-Spacing”, “Mark, Enclosing”, “Other, Format”, “Letter, Modifier”, “Symbol, Modifier”, “Letter, Uppercase”, “Letter, Lowercase”, “Letter, Titlecase”, “Punctuation, Other” or “Other, Surrogate”. Oddly, “Letter, Other” is not included.modeto0to signal we’re moving to the beginning of a word.The mbstring extension does not seem to expose the character properties. This leaves us with a problem, because we don’t have a good way to determine if a character has any of the 10 properties for which
mb_convert_casetests.Fortunately, unicode character properties in regex can save us here.
A faithful reproduction of
mb_convert_casewithout the problematic conversion to lowercase becomes:Test:
gives: