As I understand it, IDF is used to calculate how many documents have the

Question

0

Asked: June 3, 20262026-06-03T10:29:04+00:00 2026-06-03T10:29:04+00:00

As I understand it, IDF is used to calculate how many documents have the

0

As I understand it, IDF is used to calculate how many documents have the term (sort of just the idea). You can calculate IDF (along with TF) in the training set since you have all the documents beforehand. But what if I don’t have the test set beforehand and I’m getting test documents in a sequential manner (like from a web crawler), then how am I going to calculate the IDF for words in a document when it comes to testing?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T10:29:05+00:00

Editorial Team

2026-06-03T10:29:05+00:00Added an answer on June 3, 2026 at 10:29 am

For this state if your dataset is big enough you could using just training set for IDF. in the test phase if the new term be in train set use the IDF of training and if the term is new use the number of train set documents for calculate IDF.
For some purposes you could use smoothing methods for having better results.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

As I understand it, IDF is used to calculate how many documents have the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply