I am dealing with a problem of text summarization i.e. given a large chunk(s)

Question

0

Asked: May 26, 20262026-05-26T22:22:04+00:00 2026-05-26T22:22:04+00:00

I am dealing with a problem of text summarization i.e. given a large chunk(s)

0

I am dealing with a problem of text summarization i.e. given a large chunk(s) of text, I want to find the most representative “topics” or the subject of the text. For this, I used various information theoretic measures such as TF-IDF, Residual IDF and Pointwise Mutual Information to create a “dictionary” for my corpus. This dictionary contains important words mentioned in the text.

I manually sifted through the entire 50,000 list of phrases sorted on their TFIDF measure and hand-picked 2,000 phrases (I know! It took me 15 hours to do this…) that are the ground truth i.e. these are important for sure. Now when I use this as a dictionary and run a simple frequency analysis on my text and extract the top-k phrases, I am basically seeing what the subject is and I agree with what I am seeing.

Now how can I evaluate this approach? There is no machine learning or classification involved here. Basically, I used some NLP techniques to create a dictionary and using the dictionary alone to do simple frequency analysis is giving me the topics I am looking for. However, is there a formal analysis I can do for my system to measure its accuracy or something else?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T22:22:05+00:00

I’m not an expert of machine learning, but I would use cross-validation. If you used e.g. 1000 pages of text to “train” the algorithm (there is a “human in the loop”, but no problem), then you could take another few hundred test pages, and use your “top-k phrases algorithm” to find the “topic” or “subject” of these. The ratio of test pages where you agree with the outcome of the algorithm gives you a (somewhat subjective) measure of how well your method performs.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am dealing with a problem of text summarization i.e. given a large chunk(s)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply