I have some Issues in generating input training vector set for LIBSVM .. I

Question

0

Asked: June 9, 20262026-06-09T22:21:40+00:00 2026-06-09T22:21:40+00:00

I have some Issues in generating input training vector set for LIBSVM .. I

0

I have some Issues in generating input training vector set for LIBSVM .. I have 3 categories and their relevent training document with term weight as follows(Only assumption).

(Label/Category):1
Term frequency Vector(TF*IDF)
Document1-> 1:0.25 2:1.056 3:2.356
Document2-> 2:1.25 3:0.145 4:1.543
Document3-> 1:1.00 2:2.145 5:3.543

(Label/Category):2
Term frequency Vector(TF*IDF)
Document4-> 1:0.25 2:1.056 3:2.356
Document5-> 2:1.25 3:0.145 4:1.543
Document6-> 1:1.00 2:2.145 5:3.543

(Label/Category):3
Term frequency Vector(TF*IDF)
Document7-> 1:0.25 2:1.056 3:2.356
Document8-> 2:1.25 3:0.145 4:1.543
Document9-> 1:1.00 2:2.145 5:3.543

Can any one say how to convert this into set of training vector for LIBSVM.Here 1:0.25 2:1.056 3:2.356 are term index and its weight.Term indices are maintained manually in global dictionary.

As well may I know how to convert the testing document into term vector?.

thanks in advance.

Hi Qnan.. I have prepared sample training vector space as you have suggested. Can you please tell me whether my vector formation is correct or not?..

(Label/Category):1

1 1:0.25 2:1.056 3:2.356 ->(training instance 1-for Document1)
1 2:1.25 3:0.145 4:1.543 ->(training instance 2-for Document2)
1 1:1.00 2:2.145 5:3.543 ->(training instance 3-for Document3)

(Label/Category):2

2 1:0.25 2:1.056 3:2.356 ->(training instance 4-for Document4)
2 2:1.25 3:0.145 4:1.543 ->(training instance 5-for Document5)
2 1:1.00 2:2.145 5:3.543 ->(training instance 6-for Document6)

(Label/Category):3

3 1:0.25 2:1.056 3:2.356 ->(training instance 7-for Document7)
3 2:1.25 3:0.145 4:1.543 ->(training instance 8-for Document8)
3 1:1.00 2:2.145 5:3.543 ->(training instance 9-for Document9)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T22:21:41+00:00

The format is described in the README file of the LIBSVM distribution, basically it is

<categoryA> <feature1>:<value1> <feature2>:<value2> <feature3>:<value3> ...

one line per training instance. The feature indices should be in ascending order, too.

The test set looks exactly the same, except that the first column may contain some fixed number, e.g. 0, if you do not know the true labels for that set.

As for you data, I don’t quite see how you can have all those different weight vectors for the same Document1 and the same set of terms. Could you clarify that?

EDIT:

The format is OK, if you remove the comments, LIBSVM runs just fine. Assuming you’re running Windows and the file test.txt is as follows,

1 1:0.25 2:1.056 3:2.356
1 2:1.25 3:0.145 4:1.543
1 1:1.00 2:2.145 5:3.543
2 1:0.25 2:1.056 3:2.356
2 2:1.25 3:0.145 4:1.543
2 1:1.00 2:2.145 5:3.543
3 1:0.25 2:1.056 3:2.356
3 2:1.25 3:0.145 4:1.543
3 1:1.00 2:2.145 5:3.543

you can use ./libsvm-3.12/windows/svm-train.exe test.txt for training and ./libsvm-3.12/windows/svm-predict.exe test.txt test.txt.model test.txt.out for prediction. On other systems the CMD is similar.

Note that with this data the accuracy won’t be higher than 1/3, since the same weight vectors are present in the dataset with each of the labels.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some Issues in generating input training vector set for LIBSVM .. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply