How would I go about writing a co-occurence class in something like Java that takes a file full of n-grams and calculates word co-occurence for a given input term.
Are there any librarys or packages which work with Lucene (indexes) or something like a map-reduce over the n-gram list in Hadoop..?
Thanks.
Ok, so assuming you want to find the co-occurrence of two different words in a file of ngrams….
Here’s pseudo code-ish Java:
Doing a count like this would probably be pretty with Pig but you’re probably more familiar with that than me