I need to generate a list of keywords for each document in a set of documents that are loaded into MarkLogic. I am considering running cts:distinctive-terms against the set of documents, but cannot figure out how to get a list of keywords for each document rather than a list of terms relevant to the set. Can anyone suggest a solution?
Share
Were you using the
score=logtfoption? When I tried that, the scores of stop-words went up quite a bit. If you think about it this makes sense: the database can no longer use IDF to weed them out. If you only want TF, though, you could filter using a stop-word list – as already suggested.But
logtfidfscoring should naturally penalize stop-words. You can set themin-valoption or other options to tune the results. For example, here I setmin-valto 27 because stop-words began to appear at 26. The right options will depend on the existing database content, because of IDF.