I have two lists, A and B, with an equal number of elements, although the elements in each list are not necessarily distinct.
I would like to form a new list by coupling the elements from A and B at random (the random pairing is important).
However, I also need to make sure that each pair in the resulting list is unique.
So far, I’ve been approaching the problem as follows, which works for small lists, but clearly is not suited to larger lists with many combinations.
from random import shuffle
# Create a list of actors and events for testing
events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']
# Randomize the elements of each list
shuffle(events)
shuffle(actors)
# Merge the two lists into a new list of pairs
edgelist = zip(events,actors)
# If the new list of pairs has all unique elements, then it is a good solution, otherwise try again at random
x = set(edgelist)
if len(edgelist) == len(x):
break
else:
while True:
shuffle(events)
shuffle(actors)
edgelist = zip(events,actors)
x = set(edgelist)
if len(edgelist) == len(x):
break
# Display the solution
print 'Solution obtained: '
for item in edgelist:
print item
Can anyone suggest a modification or alternative approach that would scale to larger input lists?
Thanks for the helpful replies.
Update
Turns out this is a more challenging problem than originally thought. I think I now have a solution. It may not scale incredibly well but works fine for small or medium sized lists. It checks to see whether a solution is possible before starting, so assumptions about the distribution of the input lists aren’t necessary. I also included a few lines of code to show that the frequency distributions of the resulting list match the original.
# Randomize the elements
shuffle(events)
# Make sure a solution is possible
combinations = len(set(events))*len(set(actors))
assert combinations >= len(events) and combinations >= len(actors) and len(events) == len(actors), 'No soluton possible!'
# Merge the two lists into a new list of pairs (this will contain duplicates)
edgelist = zip(events,actors)
# Search for duplicates
counts = collections.Counter(edgelist)
duplicates = [i for i in counts if counts[i] > 1]
duplicate_count = len(duplicates)
while duplicate_count > 0:
# Get a single duplicate to address
duplicate = duplicates[0]
# Find the position of the duplicate in the in edgelist
duplicate_pos = edgelist.index(duplicate)
# Search for a replacement
swap = choice(edgelist)
swap_pos = edgelist.index(swap)
if (swap[0],duplicate[1]) not in edgelist:
edgelist[duplicate_pos] = (swap[0],duplicate[1])
edgelist[swap_pos] = (duplicate[0],swap[1])
# Update duplicate count
counts = collections.Counter(edgelist)
duplicates = [i for i in counts if counts[i] > 1]
duplicate_count = len(duplicates)
# Verify resulting edgelist and frequency distributions
print 'Event Frequencies: '
print sorted([y for (x,y) in list(collections.Counter(events).items())], reverse=True)
print 'Edgelist Event Frequencies: '
print sorted([y for (x,y) in list(collections.Counter([x for (x,y) in edgelist]).items())], reverse=True)
print 'Actor Frequencies: '
print sorted([y for (x,y) in list(collections.Counter(actors).items())], reverse=True)
print 'Edgelist Actor Frequencies: '
print sorted([y for (x,y) in list(collections.Counter([y for (x,y) in edgelist]).items())], reverse=True)
assert len(set(edgelist)) == len(events) == len(actors)
Well, there is no reason for you to shuffle both lists. The pairing will not get “more random”.
Update: I’ve posted a different solution than my original one. It’s recursive, and is guaranteed to always return a valid answer, or None if one is not possible.
Note: the “index” variable is similar to checking edgelist, on the OP’s solution. The “tried_pairs” variable is just an optimisation for each specific recursion step, to avoid retrying the same pair over and over again (if, for instance, there are several consecutive identical items in actors).