Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8288961
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T12:23:41+00:00 2026-06-08T12:23:41+00:00

I’m trying to build a randomized dataset based on an input dataset. The input

  • 0

I’m trying to build a randomized dataset based on an input dataset.
The input dataset consists of 856471 lines, and in each line there is a pair of values separated by a tab.
NO entry from the randomized dataset can be equal to any of those in the input dataset, this means:

If the pair in line 1 is “Protein1 Protein2”, the randomized dataset cannot contain the following pairs:

  • “Protein1 Protein2”
  • “Protein2 Protein1”

In order to achieve this I tried the following:

data = infile.readlines()
ltotal = len(data)
for line in data:
    words = string.split(line)

init = 0
while init != ltotal:
    p1 = random.choice(words)
    p2 = random.choice(words)
    words.remove(p1)
    words.remove(p2)
    if "%s\t%s\n" % (p1, p2) not in data and "%s\t%s\n" % (p2, p1) not in data:
        outfile.write("%s\t%s\n" % (p1, p2))

However, I’m getting the following error:

Traceback (most recent call last):   File
"C:\Users\eduarte\Desktop\negcreator.py", line 46, in <module>
    convert(indir, outdir)   File "C:\Users\eduarte\Desktop\negcreator.py", line 27, in convert
    p1 = random.choice(words)   File "C:\Python27\lib\random.py", line 274, in choice
    return seq[int(self.random() * len(seq))]  # raises IndexError if seq is empty
IndexError: list index out of range

I was pretty sure this would work. What am I doing wrong?
Thanks in advance.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T12:23:44+00:00Added an answer on June 8, 2026 at 12:23 pm

    The variable words is overwritten for each line in the loop

    for line in data:
        words = string.split(line)
    

    This is most probably not what you want.

    Moreover, your while loop is an infinite loop, which will consume words eventually, leaving no choices for random.choice().

    Edit: My guess is that you have a file of tab-separated word pairs, a pair in each line, and you are trying to form random pairs from all of the words, writing only those random pairs to the output file that do not occur in the original file. Here is some code doing this:

    import itertools
    import random
    with open("infile") as infile:
        pairs = set(frozenset(line.split()) for line in infile)
    words = list(itertools.chain.from_iterable(pairs))
    random.shuffle(words)
    with open("outfille", "w") as outfile:
        for pair in itertools.izip(*[iter(words)] * 2):
            if frozenset(pair) not in pairs:
                outfile.write("%s\t%s\n" % pair)
    

    Notes:

    1. A pair of words is represented by a frozenset, since order does not matter.

    2. I use a set for all the pairs to be able to test if a pair is in the set in constant time.

    3. Instead of using random.choice() repeatedly, I only shuffle the whole list once, and then iterate over it in pairs. This way, we don’t need to remove the already used words from the list, so it’s much more efficient. (This change an the previous one bring down the algorithmic complexity of the approach from O(n²) to O(n).)

    4. The expression itertools.izip(*[iter(words)] * 2) is a common Python idiom to iterate over words in pairs, in case you did not encounter that one yet.

    5. The code is still untested.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Basically, what I'm trying to create is a page of div tags, each has
I am trying to understand how to use SyndicationItem to display feed which is
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I am trying to render a haml file in a javascript response like so:
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
We're building an app, our first using Rails 3, and we're having to build
Configuring TinyMCE to allow for tags, based on a customer requirement. My config is
I need to clean up various Word 'smart' characters in user input, including but
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.