Given the following sample strings:
PP12111 LOREM IPSUM TM ENCORE
LOREM PP12111 IPSUM TM ENCORE
LOREM IPSUM ENCORE TM PP12111
LOREM PP12111 PP12111 TM ENCORE
What would be a .NET RegEx to set title case and then convert any string containing numbers and letters to upper case (see note below):
PP12111 Lorem Ipsum TM Encore
Lorem PP12111 Ipsum TM Encore
Lorem Ipsum Encore TM PP12111
Lorem PP12111 PP12111 TM Encore
Alternativley, I can start with everything set to Title Case so only the strings containing numbers and letters need to be set to upper case:
Pp12111 Lorem Ipsum TM Encore
Lorem Pp12111 Ipsum TM Encore
Lorem Ipsum Encore TM Pp12111
Lorem Pp12111 Pp12111 TM Encore
Note: if any variant of TM exists (tm, Tm, tM), the it should be full upper case. Where the TM could be “lorem ipsum TM valor” or “lorem ipsum (TM) valor”.
Here is a pure string manipulation method that works; I would think that a RegEx solution may be a better fit?
private static void Main( string[] args )
{
var phrases = new[]
{
"PP12111 LOREM IPSUM TM ENCORE", "LOREM PP12111 IPSUM TM ENCORE",
"LOREM IPSUM ENCORE TM PP12111", "LOREM PP12111 PP12111 TM ENCORE",
};
Test(phrases);
}
private static void Test( IList<string> phrases )
{
var ti = Thread.CurrentThread.CurrentCulture.TextInfo;
for( int i = 0; i < phrases.Count; i++ )
{
string p = ti.ToTitleCase( phrases[i].ToLower() );
string[] words = p.Split( ' ' );
for( int j = 0; j < words.Length; j++ )
{
string word = words[j];
if( word.ToCharArray().Any( Char.IsNumber ) )
{
word = word.ToUpper();
}
words[j] = word.Replace( " Tm ", " TM " ).Replace( "(Tm)", "(TM)" );
}
phrases[i] = string.Join( " ", words );
Console.WriteLine( phrases[i] );
}
}
You can use this regex like this:
\bIs a word boundary.pos(?!suffix)Matches position not preceeding suffix.\b(?!TM\b)Word boundary not preceeding TM[A-Z]+Words without digits.Together: Word boundary not preceeding “TM” followed by words with letters A through Z and word boundary.
UPDATE #1
Upper casing “tm”, “Tm”, “tM”:
I don’t know if everything not capitalized can be upper case. In that case the easiest solution would be to upper case the input:
input.ToUpper(). Otherwise execute a second regex replace:UPDATE #2
If you want to upper case several words, you can just use another match evaluator:
tm|xxx|yyyspecifies the words to be upper cased (“tm”, “xxx” or “yyy”).