Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9274049
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T16:15:41+00:00 2026-06-18T16:15:41+00:00

So I have a uniformly formatted text file that I am trying to parse

  • 0

So I have a uniformly formatted text file that I am trying to parse based on the number of lines below the word ‘cluster’. Here is my code so far:

f = open('file.txt', 'r')
main_output = open('mainoutput.txt', 'w')
minor_output = open('minoroutput.txt', 'w')
f_lines = f.readlines()
main_list = []
minor_list = []
for n, line in enumerate(open('file.txt')):
    if 'cluster' in line:
        if 'cluster' in f_lines[n+1] or f_lines[n+2] or f_lines[n+3]:
            minor_list.append(line)
            minor_list.append(f_lines[n+1])
            minor_list.append(f_lines[n+2])
            minor_list.append(f_lines[n+3])
        if 'cluster' not in f_lines[n+1] or f_lines[n+2] or f_lines[n+3]:
            main_list.append(line)
            main_list.append(f_lines[n+1])
            main_list.append(f_lines[n+2])
            main_list.append(f_lines[n+3])
minor_output.write(''.join(minor_list))
main_output.write(''.join(main_list))
f.close()
main_output.close()
minor_output.close()

The format of the text file is as follows:

>Cluster 1
line 1
line 2
line 3
...

>Cluster 2
line 1
line 2
...

and so on for many clusters.

Each cluster has a variable number of lines below it, from 1 to 100+. I am interested in sorting these clusters by the number of lines(items) in each cluster. This code is working but the two output files are identical. Any help with my code or my strategy would be awesome!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T16:15:43+00:00Added an answer on June 18, 2026 at 4:15 pm

    If I understand the code you’ve posted correctly, you want to sort your data into two different files depending on how many items are in a cluster. If there are three or fewer, the cluster goes into minoroutput.txt, while if there are more than that, it goes into mainoutput.txt.

    There are a couple of significant logic errors that I suspect are causing your code to not sort the data properly.

    Firstly, your test to see if an line contains the word "cluster" won’t match capitalized "Cluster" like you have in your example data. This may only be an issue with the example data you’ve shown, and it’s would be easy to fix by calling lower() on the line before checking it.

    Second, your check of later lines is incorrect. The code if 'cluster' in f_lines[n+1] or f_lines[n+2] or f_lines[n+3] doesn’t check for "cluster" in each of the three strings, but rather only in the first. The second and third strings are being evaluated all by themselves, in boolean context. If they’re not empty lines, they’ll be True, making the whole expression almost always true as well. For this to work, you’d need to check 'cluster' in f_lines[n+1] or 'cluster' in f_lines[n+2] or 'cluster' in f_lines[n+3] (but I’ll show a better alternative later). The same problem happens with the other if statement, where you will also almost always get a True result from your condition, since f_lines[n+2] and f_lines[n+3] are probably not both empty.

    Lastly, your logic for writing out the clusters is probably incorrect. It currently writes out exactly four lines always, even though many clusters will have more or fewer items than that. For every cluster written to mainoutput.txt, some lines will be discarded (this might be deliberate). For some cluster’s written to minoroutupt.txt, however, there’s going to be a clear bug where it will write out the start of the next cluster after a cluster with only one or two items.

    Here’s some code that I think will work for you. I’ve changed around the loop so that it just reads the file once, rather than reading the lines once into a list and a second time in enumerate. Rather than explicitly looking at the next three lines, I simply put each line into a list, resetting each time there’s a line with cluster in it (with any capitalization).

    with open('file.txt', 'r') as f, \
         open('mainoutput.txt', 'w') as main_out, \
         open('minoroutput.txt', 'w') as minor_out:
        cluster = [] # this variable will hold all the lines of the current cluster
        for line in f:
            if 'cluster' in line.lower(): # if we're at the start of a cluster
                if len(cluster) > 4: # long clusters go in the "main" file
                    main_out.writelines(cluster) # write out the lines
                    # main_out.writelines(cluster[:4])
                else:
                    minor_out.writelines(cluster) # or to the other file
    
                cluster = [] # reset the cluster variable to a new, empty list
    
            cluster.append(line) # always add the current line to cluster
    
        if len(cluster) > 4: # repeat the writing logic for the last cluster
            main_out.writelines(cluster)
            # main_out.writelines(cluster[:4])
        else:
            minor_out.writelines(cluster)
    

    Use the two commented writelines lines in place of the uncommented ones just before them if you only want the first three items in a cluster to be output into mainout.txt (with the rest being discarded). I don’t think there’s a reasonable alternative to printing all the lines in minorout.txt.

    Given file.txt with these contents:

    >Cluster 1
    line 1
    line 2
    line 3
    >Cluster 2
    line 1
    line 2
    line 3
    line 4
    >Cluster 3
    line 1
    >Cluster 4
    line 1
    line 2
    line 3
    line 4
    line 5
    

    The code above will output two files:

    mainoutput.txt:

    >Cluster 2
    line 1
    line 2
    line 3
    line 4
    >Cluster 4
    line 1
    line 2
    line 3
    line 4
    line 5
    

    minoroutput.txt:

    >Cluster 1
    line 1
    line 2
    line 3
    >Cluster 3
    line 1
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Question: Suppose you have a random number generator randn() that returns a uniformly distributed
I have a list of text files file1.txt, file2.txt, file3.txt .. filen.txt that I
We have already seen that spanning trees and cuts are intimately related. Here is
I have to display a number whose color changes from Red to Green uniformly
I have formatted my form using uniform jquery plugin. Also for submitting the form
I have a sorted array of doubles (latitudes actually) that relatively uniformally spread out
Have a painfully simple blog Post creator, and I'm trying to check if the
Have an issue with marshall and unmarshall readers and writers. So here it is.
I have a query with the below WHERE clauses WHERE I.new_outstandingamount = 70 AND
If i have a uniformly distributed random variable in [0,1), how can i modify

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.