I have a tweets document, it contains like that lines:
RichardJ0nes -- Should I upgrade to an iPhone 5? Decisions, decisions!
AnthonyMOliva -- @AnthonyMOliva AT&T offering iPhone 5 refurbished starting at $99: Two months after making its debut, refur... http://t.co/IsPDzIrD #BBC
mittrashi -- RT @timesofindia: Apple iPhone 5S, iPad 5 already in the works? - The Times of India http://t.co/s782BHp5
I want to clean this document.
Firstly I want to clean the user names( example: RichardJ0nes — or @AnthonyMOliva ) and secondly I want to clean links (example http://t.co/s782BHp5) .
It should be like that :
Should I upgrade to an iPhone 5? Decisions, decisions!
AT&T offering iPhone 5 refurbished starting at $99: Two months after making its debut, refur...
Apple iPhone 5S, iPad 5 already in the works? - The Times of India
I try doing something with regular expression on the notepad++ but I could not clean the text
I try delete first usernames with
find .*\(--\)
replace: \1
but it does not work on notepad++ . How should I do, please give me an idea ?
Search for
(^\S+\s--|\bhttps?://\S+|(?:^|(?<=\s))[@#]\S+)\s?and replace it with empty string.