I want a regular expression to match valid input into a Tags input field with the following properties:
- 1-5 tags
- Each tag is 1-30 characters long
- Valid tag characters are [a-zA-Z0-9-]
- input and tags can be separated by any amount of whitespace
For example:
Valid: tag1 tag2 tag3-with-dashes tag4-with-more-dashes tAaG5-with-MIXED-case
Here’s what I have so far–it seems to work but I’m interested how it could be simplified or if it has any major flaws:
\s*[a-zA-Z0-9-]{1,30}(\s+[a-zA-Z0-9-]{1,30}){0,4}\s*
// that is:
\s* // match all beginning whitespace
[a-zA-Z0-9-]{1,30} // match the first tag
(\s+[a-zA-Z0-9-]{1,30}){0,4} // match all subsequent tags
\s* // match all ending whitespace
Preprocessing the input to make the whitespace issue easier isn’t an option (e.g. trimming or adding a space).
If it matters, this will be used in javascript. Any suggestions would be appreciated, thanks!
You can simplify it a bit like this:
The
(?: )syntax is a noncapturing group, which I believe should improve performance when you don’t need groups per se.Then the trick is this statement:
Thanks to the caret, this will match the beginning of the line, or one or more characters of whitespace.
UPDATE: This works perfectly in my testing and there’s certainly less redundant code. However, I just used the benchmarking in Regex Hero to find that your original regex is actually faster. That’s probably because mine is causing more backtracking to occur.
UPDATE #2: I found another way that accomplishes the same thing, I think:
I realized that I was trying too hard.
\s*matches 0 or more spaces, which means that it’ll work for a single tag. But… it’ll work for 2-5 tags as well because the space is not in your character class[ ]. And indeed it fails with 6 tags as it should. That means this a much more forward-looking regex with less backtracking, better performance, and less redundancy.UPDATE #3:
I see the error in my ways. This should work better.
Putting the
\bjust before the last)will assert a word boundary. That allows the 1-30 character length rule to work properly again.