How can i capture the string between > and < in R.
d<-"\"id/56771\" target=\"_self\">Children- and adolescents</a></li>\n\t\t\t<li><"
//M
str_extract(d,">+(.*?)+<") gives me
>Children- and adolescents</a></li>\n\t\t\t<li><
I guess a new string command could do the trick, but I thought there would be something more direct…
You can use
str_extract, butstr_matchmay be better suited:The trick here is the
?modifier that tells the regex to be not greedy. Regex matching is greedy by default, which means that it will match the longest string with your pattern.This still leaves you with a bit of work to do, i.e. remove the first and last character. One can do this with vector subsetting, or it might be slightly easier to use
str_matchinstead. This returns all of the pattern matches as an array:(The two matches are 1. The entire string, and 2. The pattern inside the brackets.)
This means it’s a simple matter of returning the second element: