Given document-D1: containing words (w1,w2,w3)
and document D2 and words (w2,w3..)
and document Dn and words ( w1,w2, wn)
Can I structure my data in big table to answer the questions like:
which words occur most frequently with w1,
or which words occur most frequently with w1 and w2.
What I am trying to achieve is to find the third word Wx (suggestion) which ocures most frequently in documents togehter with given words W1 and W2
I know the solution in SQL, but is it possible with google-big table?
I know I would have to build my indices by myself, the question is how should I structure them to avoid index explosion
thanks
almir
Using list-properties and merge-join is the best way to answer set membership questions in Google App Engine: Building Scalable, Complex Apps on App Engine.
You could setup your model as follows:
Then it would be easy to query for co-occurrence. For example, which documents have the words
googleandengine?There are some limitations, though. From the presentation:
You could also create a model of words and save in the
StringListPropertyonly their keys, but depending on the size of your documents even that would not be feasible.