Given data
s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)
I can count 1s and 0s with table or ftable
ftable(s,row.vars =1:1)
and the totals of 11s,01s,10s,00s occurred in s with
table(s[-length(s)],s[-1]).
What would be the clever way to count occurrences of 111s, 011s, …, 100s, 000s? Ideally, I want a table of counts x like
0 1
11 x x
01 x x
10 x x
00 x x
Is there a general way to compute the total occurrences for all possible sub-sequences of length k=1,2,3,4, … occurred in data?
Well, it seems like you would first need to generate n-tuples from your vector. The following function should accomplish that:
Then you could feed the results of
makeTuples()totable()usingdo.call():This works because the
makeTuples()function returns the tuples as a list of lists. The output isn’t quite as nice as you wanted, but you could write a function to reformat, say:To:
It would require looping over the outer n-2 dimensions of the n-dimensional array returned by
table, creating row names and concatenating things together.So, I was just sitting in a Stochastic processes class when I figured out a more or less straight-forward way to produce the output you want without trying to unwind the output of
table(). First you will need a function that generates all possible permutations of n selections from your population. The generation of permutations can be done withexpand.grid(), but it needs a little sugar-coating:The basic idea is to iterate over the list of permutations and count the number of tuples that match the given permutation. Since you want the results split out into a table, we should select a permutation of n-1 elements from the population and let the last position form the columns of the table. Here’s a function that takes a permutation of size n-1, a list of tuples, and the population the tuples were drawn from and produces a named vector of match counts:
Finally, all three functions can be combined into a bigger function that takes a vector, splits it into n-tuples and returns a frequency table. The final aggregation operation is done using
ldply()from Hadley Wickham’splyrpackage as it does a nice job of preserving information such as which permutation corresponds to which row of output matches:And there you go:
Apologies to my stochastic processes teacher, but functional programming problems in R were just more interesting than the Gambler’s Ruin today…