I’ve found various implementations of ngrams in Python, Perl, etc., but I’d really like something in a bash script. I ran across the “Missing textutils” version, but that only lists the ngrams, it doesn’t count them by frequency, which is fairly central to using ngrams — or at least to my usage. I just want a basic list of results with their frequency, like this…
17 blue car
14 red car
5 and the
2 brown monkey
1 orange car
Anybody have something like that lying around that they could post? Thanks!
Here is a pure bash implementation. You’ll need to use a version of bash >= 4.2 with support for associative arrays.
Save as
ngramand use asngram 2 < file.