Based on the given input:
I can do waaaaaaaaaaaaay better :DDDD!!!! I am sooooooooo exicted about it :))) Good !!
Desired: output
I can do way/LNG better :D/LNG !/LNG I am so/LNG exicted about it :)/LNG Good !/LNG
— Challenges:
- better vs. soooooooooo >> we need to keep the first one as is but shorten the second
- for the second we need to add a tag (LNG) as it might have some importance for intensification for subjectivity and sentiment
—- Problem: error message “unbalanced parentheses”
Any ideas?
My code is:
import re
lengWords = {} # a dictionary of lengthened words
def removeDuplicates(corpus):
data = (open(corpus, 'r').read()).split()
myString = " ".join(data)
for word in data:
for chr in word:
countChr = word.count(chr)
if countChr >= 3:
lengWords[word] = word+"/LNG"
lengWords[word] = re.sub(r'([A-Za-z])\1+', r'\1', lengWords[word])
lengWords[word] = re.sub(r'([\'\!\~\.\?\,\.,\),\(])\1+', r'\1', lengWords[word])
for k, v in lengWords.items():
if k == word:
re.sub(word, v, myString)
return myString
It’s not the perfect solution, but I don’t have time to refine it now- just wanted to get you started with easy approach: