Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9250823
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T10:32:28+00:00 2026-06-18T10:32:28+00:00

I have a problem but I feel the solution should be quite simple. I’m

  • 0

I have a problem but I feel the solution should be quite simple. I’m building a model and want to test its accuracy by 10-fold cross-validation. To do this I have to split my training corpus 90%/10% into training and test sections, then train my model on the 90% and test on the 10%. This I want to do ten times, by taking a different 90%/10% split every time, so that eventually each bit of the corpus has been used as testing data. Then I’ll average the results for each 10% test.

I have tried to write a script to extract 10% of the training corpus and write it to a new file, but so far I don’t get it working. What I have done is counting the total number of lines in the file, and then dividing this number by ten to know the size of each of the ten different test sets that I want to extract.

trainFile = open("danish.train")
numberOfLines = 0

for line in trainFile:
    numberOfLines += 1

lengthTest = numberOfLines / 10

I have found, for my own training file, that it consists of 3638 lines, so each test should consist roughly of 363 lines.

How do I write line 1-363, line 364-726, etc. to different test files?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T10:32:29+00:00Added an answer on June 18, 2026 at 10:32 am

    Once you have the count of lines, go back to the beginning of the file, and start copying out lines to danish.train.part-01. When the line number is a multiple of the size of the 10% test set, open a new file for the next part.

    #!/usr/bin/env python2.7
    
    trainFile = open("danish.train")
    numberOfLines = 0
    
    for line in trainFile:
        numberOfLines += 1
    
    lengthTest = numberOfLines / 10
    
    # rewind file to beginning
    trainFile.seek(0)
    
    numberOfLines = 0
    file_number = 0
    for line in trainFile:
        if numberOfLines % lengthTest == 0:
            file_number += 1
            output = open('danish.train.part-%02d' % file_number, 'w')
    
        numberOfLines += 1
        output.write(line)
    

    On this input file (sorry I don’t speak Danish!):

    one
    two
    three
    four
    five
    six
    seven
    eight
    nine
    ten
    eleven
    twelve
    thirteen
    fourteen
    fifteen
    sixteen
    seventeen
    eighteen
    nineteen
    twenty
    twenty-one
    twenty-two
    twenty-three
    twenty-four
    twenty-five
    twenty-six
    twenty-seven
    twenty-eight
    twenty-nine
    thirty
    

    This creates files

    danish.train.part-01
    danish.train.part-02
    danish.train.part-03
    danish.train.part-04
    danish.train.part-05
    danish.train.part-06
    danish.train.part-07
    danish.train.part-08
    danish.train.part-09
    danish.train.part-10
    

    and part 5, for example, contains:

    thirteen
    fourteen
    fifteen
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a very simple problem but cannot find a nice solution. I have
I do have a solution for the following problem but it's quite ugly and
I have a problem but first i want to know if im working on
I have a simple problem but I am not sure how to solve it.
I have a simple problem but no matter what I try I can't see
I have a simple problem but I don't know how to solve it because
I have to make a simple layout in android but have problem with the
Hi I am new to PHP and have a simple problem but I have
I've searched google and SO for a solution to this problem but have not
This is an interesting problem I've come across that I feel should have an

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.