In python how can i develop this algorithm to find common patterns in a text which is in an array and say that these are the number of occurences of those items.
For ex:
line_arr=""" :)hello hi there,My name is 'pixel' can i speak to 'Tom'
Hi, tom here :) 'pixel'
how are u doing today.
i just called to ask whats the cost of the microwae oven is it $50 or $60
it is $75
any d $iscounts on this..
10% to 30%"""
reg_dict={}
for l in line_arr:
#find all common patterns and update it in an dictionary
Can we get all smileys,names in single quotes,currecncy starting with $ and percentages..Also if any more common things.and say we update this in a dictionary..Is this possible at all..
What you have is a string, not an array. You should tokenize it first. Once you’ve done that, you can use
collections.Counter.most_common:If you want to find smileys, use a different tokenizer than the RE
\w+that I used above.