I need to validate a string against a character vector pattern. My current code

Question

0

Asked: June 18, 20262026-06-18T02:17:10+00:00 2026-06-18T02:17:10+00:00

I need to validate a string against a character vector pattern. My current code

0

I need to validate a string against a character vector pattern. My current code is:

trim <- function (x) gsub("^\\s+|\\s+$", "", x)

# valid pattern is lowercase alphabet, '.', '!', and '?' AND
# the string length should be >= than 2
my.pattern = c(letters, '!', '.', '?')

check.pattern = function(word, min.size = 2)
{
    word = trim(word)
    chars = strsplit(word, NULL)[[1]]
    all(chars %in% my.pattern) && (length(chars) >= min.size)
}

Example:

w.valid = 'special!'
w.invalid = 'test-me'

check.pattern(w.valid) #TRUE
check.pattern(w.invalid) #FALSE

This is VERY SLOW i guess…is there a faster way to do this? Regex maybe?
Thanks!

PS: Thanks everyone for the great answers. My objective was to build a 29 x 29 matrix,
where the row names and column names are the allowed characters. Then i iterate over each word of a huge text file and build a ‘letter precedence’ matrix. For example, consider the word ‘special’, starting from the first char:

row s, col p -> increment 1
row p, col e -> increment 1
row e, col c -> increment 1
... and so on.

The bottleneck of my code was the vector allocation, i was ‘appending’ instead of pre-allocate the final vector, so the code was taking 30 minutes to execute, instead of 20 seconds!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T02:17:12+00:00

There are some built-in functions that can clean up your code. And I think you’re not leveraging the full power of regular expressions.

The blaring issue here is strsplit. Comparing the equality of things character-by-character is inefficient when you have regular expressions. The pattern here uses the square bracket notation to filter for the characters you want. * is for any number of repeats (including zero), while the ^ and $ symbols represent the beginning and end of the line so that there is nothing else there. nchar(word) is the same as length(chars). Changing && to & makes the function vectorized so you can input a vector of strings and get a logical vector as output.

check.pattern.2 = function(word, min.size = 2)
{
    word = trim(word)
    grepl(paste0("^[a-z!.?]*$"),word) & nchar(word) >= min.size
}
check.pattern.2(c(" d ","!hello  ","nA!","  asdf.!"," d d "))
#[1] FALSE  TRUE FALSE  TRUE FALSE

Next, using curly braces for number of repetitions and some paste0, the pattern can use your min.size:

check.pattern.3 = function(word, min.size = 2)
{
    word = trim(word)
    grepl(paste0("^[a-z!.?]{",min.size,",}$"),word)
}
check.pattern.3(c(" d ","!hello  ","nA!","  asdf.!"," d d "))
#[1] FALSE  TRUE FALSE  TRUE FALSE

Finally, you can internalize the regex from trim:

check.pattern.4 = function(word, min.size = 2)
{
    grepl(paste0("^\\s*[a-z!.?]{",min.size,",}\\s*$"),word)
}
check.pattern.4(c(" d ","!hello  ","nA!","  asdf.!"," d d "))
#[1] FALSE  TRUE FALSE  TRUE FALSE

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to validate a string against a character vector pattern. My current code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply