I am trying to get a counter that looks through a text and returns the frequency of a letter with regard to the previous pair of letters.
For example part of the output would be :
'th' : Counter ({'e':119, 'a':145 etc... })
I want it to iterate over all possible pairs in the lowercase characters.
Until now, I have been using the following code to get an output that only takes into account the previous letter:
def pairwise(iterable):
it = iter(iterable)
last = next(it)
for curr in it:
yield last, curr
last = curr
valid = set('abcdefghijklmnopqrstuvwxyz ')
def valid_pair((last, curr)):
return last in valid and curr in valid
def make_markov(text):
markov = defaultdict(Counter)
lowercased = (c.lower() for c in text)
for p, q in ifilter(valid_pair, pairwise(lowercased)):
markov[p][q] += 1
return markov
Untested: