I am trying to extract records from a data.frame using grepl. Here are some example cases.
a <- c('This is a healthcare facility', 'this is a hospital', 'this is a hospital district', 'this is a district health service')
I wish to extract all records that have hospital but not district. I have come unstuck when district and hospital occur in the same string. I tried using the dollowing:
str_match(string=a,pattern='hospital|^district' )
How do I limit district but still include hospital in this example?
Thanks.
R supports Perl-compatible regular expressions, which allow negative lookahead assertions, so in principle, you can write:
(which matches “start-of-string, followed by a point in the string that is not followed by
.*district, followed by.*hospital“). That said, I’m really not sure if putting this condition into a single regex is the best way to do it; there may be a more R-ish way.