I know this question had been asked here and here but there was a

Question

0

Asked: June 15, 20262026-06-15T15:12:37+00:00 2026-06-15T15:12:37+00:00

I know this question had been asked here and here but there was a

0

I know this question had been asked here and here but there was a small problem when I tried it out:

x<- str_extract("Hello peopllz! My new home is #crazy gr8! #wow", "#\S+")
Error: '\S' is an unrecognized escape in character string starting "#\S"

I changed the regex to "#(.+) ?", "#\\s", but they did not extract the hashtags.

I then tried the gsub way:

x<- gsub("[^#(.+) ?]","","Hello! #London is gr8. #Wow")

It gave: " # . #"

Any ideas where I am going wrong? I’d like my output as a vector/list of all the hashtags in the tweet(without the hashes!)

Edit: I would prefer not tokenizing the tweet, because:
1. I am not tokenizing the tweets for the rest of my program,
2. It would become a very expensive step were I to scale it to handle large volumes of tweets.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T15:12:38+00:00

Use "#\\S+" instead of "#\S+".

str_extract_all("Hello peopllz! My new home is #crazy gr8! #wow", "#\\S+")
# [[1]]
# [1] "#crazy" "#wow"

There are two levels of parsing going on here. Before the low level regexp function within str_extract gets the pattern you want to search for (i.e. "#\S+") it is first parsed by R. R does not recognize \S as a valid escape character and throws an error. By escaping the slash with \\ you tell R to pass the \ and S as two normal characters to the regexp function, instead of interpreting it as one escape character.

Side track

This can produce rather bizarre expressions. Imagine that you have a list of addresses to computers on a windows network on the form of "\\computer". To search for it you would need to type str_extract(adr, "\\\\\\w+") which would turn into "\\\w+" internally and then search for.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I know this question had been asked here and here but there was a

Leave an answerCancel reply

1 Answer

Side track

Leave an answer
Cancel reply