I have a problem but I feel the solution should be quite simple. I’m

Question

0

Asked: June 18, 20262026-06-18T10:32:28+00:00 2026-06-18T10:32:28+00:00

I have a problem but I feel the solution should be quite simple. I’m

0

I have a problem but I feel the solution should be quite simple. I’m building a model and want to test its accuracy by 10-fold cross-validation. To do this I have to split my training corpus 90%/10% into training and test sections, then train my model on the 90% and test on the 10%. This I want to do ten times, by taking a different 90%/10% split every time, so that eventually each bit of the corpus has been used as testing data. Then I’ll average the results for each 10% test.

I have tried to write a script to extract 10% of the training corpus and write it to a new file, but so far I don’t get it working. What I have done is counting the total number of lines in the file, and then dividing this number by ten to know the size of each of the ten different test sets that I want to extract.

trainFile = open("danish.train")
numberOfLines = 0

for line in trainFile:
    numberOfLines += 1

lengthTest = numberOfLines / 10

I have found, for my own training file, that it consists of 3638 lines, so each test should consist roughly of 363 lines.

How do I write line 1-363, line 364-726, etc. to different test files?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T10:32:29+00:00

Once you have the count of lines, go back to the beginning of the file, and start copying out lines to danish.train.part-01. When the line number is a multiple of the size of the 10% test set, open a new file for the next part.

#!/usr/bin/env python2.7

trainFile = open("danish.train")
numberOfLines = 0

for line in trainFile:
    numberOfLines += 1

lengthTest = numberOfLines / 10

# rewind file to beginning
trainFile.seek(0)

numberOfLines = 0
file_number = 0
for line in trainFile:
    if numberOfLines % lengthTest == 0:
        file_number += 1
        output = open('danish.train.part-%02d' % file_number, 'w')

    numberOfLines += 1
    output.write(line)

On this input file (sorry I don’t speak Danish!):

one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
twenty-two
twenty-three
twenty-four
twenty-five
twenty-six
twenty-seven
twenty-eight
twenty-nine
thirty

This creates files

danish.train.part-01
danish.train.part-02
danish.train.part-03
danish.train.part-04
danish.train.part-05
danish.train.part-06
danish.train.part-07
danish.train.part-08
danish.train.part-09
danish.train.part-10

and part 5, for example, contains:

thirteen
fourteen
fifteen

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a problem but I feel the solution should be quite simple. I’m

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply