I made a python script to take text from an input file and randomly rearrange the words for a creative writing project based around the cut-up technique (http://en.wikipedia.org/wiki/Cut-up_technique).
Here’s the script as it currently stands. NB: I’m running this as a server side include.
#!/usr/bin/python
from random import shuffle
src = open("input.txt", "r")
srcText = src.read()
src.close()
srcList = srcText.split()
shuffle(srcList)
cutUpText = " ".join(srcList)
print("Content-type: text/html\n\n" + cutUpText)
This basically does the job I want it to do, but one improvement I’d like to make is to identify duplicate words within the output and remove them. To clarify, I only want to identify duplicates in a sequence, for example “the the” or “I I I”. I don’t want to make it so that, for example, “the” only appears once in the entire output.
Can someone point me in the right direction to start solving this problem? (My background isn’t in programming at all, so I basically put this script together through a lot of reading bits of the python manual and browsing this site. Please be gentle with me.)
You can write a generator to produce words without duplicates:
Then you can use this in your program: