I’m trying to create a Regex usuable in C# that will allow me to take a list of single letters and/or letter groups and ensure that a word is only comprised of items from that list. For instance:
- ‘a’ would match ‘a’, ‘aa’, ‘aaa’, but not ‘ab’
- ‘a b’ would match ‘a’, ‘ab’, ‘abba’, ‘b’, but not ‘abc’
- ‘a b abc’ would match ‘a’, ‘ab’, ‘abc’, ‘aabc’, ‘baabc’, but not ‘ababac’
I thought something of the form
(a|b|abc)*
would work, but it incorrectly matches the last term. Here’s the code I’m testing with:
[Fact] public void TestRegex() { Regex regex = new Regex('(a|b|abc)*'); regex.IsMatch('a').ShouldBeTrue(); regex.IsMatch('b').ShouldBeTrue(); regex.IsMatch('abc').ShouldBeTrue(); regex.IsMatch('aabc').ShouldBeTrue(); regex.IsMatch('baabc').ShouldBeTrue(); // This should not match ... I don't think anyway regex.IsMatch('ababac').ShouldBeFalse(); }
I have a pretty basic understanding of regex, so apologies if I’m missing something obvious here 🙂
Update I don’t understand why your counter-example is a counter-example : ababac = a b a bac. cCould you clarify ?
I only want to use ‘a’, ‘b’, and ‘abc’ – ‘bac’ would be a completely different term.
Let me give another example: Using ‘ba’ and ‘t’, I could match the word ‘bat’, but not ‘tab’. The order of the letters inside the letter groups is important.
(Tests with Diadistis’ solution)
[Fact] public void TestRegex() { Regex regex = new Regex(@'\A(?:(e|l|ho)*)\Z'); regex.IsMatch('e').ShouldBeTrue(); regex.IsMatch('l').ShouldBeTrue(); regex.IsMatch('ho').ShouldBeTrue(); regex.IsMatch('elho').ShouldBeTrue(); regex.IsMatch('hole').ShouldBeTrue(); regex.IsMatch('holle').ShouldBeTrue(); regex.IsMatch('hello').ShouldBeFalse(); regex.IsMatch('hotel').ShouldBeFalse(); }
I am not quite sure what are you trying to do but in order for the last one to be false you should check if the string can be matched entirely :