I am generating text using this :
for i in xrange(100):
sys.stdout.write(alphabet[bisect.bisect(f_list, random.random()) - 1])
i get output that looks like this:
fnhtlr hhub del tn eleo s d nerowepeldhoantah yf tr e saetenwgkoyears
oenooe urbmhonnrniwc iasseb
and I would like to know how to store the output as TEXT, not a list, so that i can use fd.inc(word) on it. I am basically trying to plot Zipf’s law with my random output.
if I use this :
text1 = [alphabet[bisect.bisect(f_list, random.random())] for i in xrange(300)]
my output is stored as a list and fd doesn’t work on it, as it considers each character to be a separate word.
for word in text1:
fd.inc(word)
print fd
<FreqDist: ' ': 1776, 'e': 1008, 'a': 752, 't': 750, 'n': 604, 'i': 586,
'o': 556, 'h': 542, 's': 528, 'r': 478, 'l': 388, 'd': 312, 'u': 242,
'm': 202, 'w': 192, 'g': 172, 'b': 152, 'p': 152, 'f': 150, 'c': 148, 'y': 120,
'k': 90, 'v': 66, 'q': 12, 'z': 10, 'x': 8, 'j': 4>
I would like each sequence of letters separated by a space to be considered as a word, i.e. for the output to be considered as text.
Thank you for your help!
Try this:
As to add more detail:
' '.join(list)is the pythonic way of joining a list to a string. The' '-part says that it should be joined with a whitespace. If you for example would join it with a comma it would be','instead.Or you could even skip the brackets like this:
Maybe you want to join the list completly without anything between the charachters. In that case the solution is using join like this:
One more thing thought. What happens if you change your last sample snippet in your question to:
This will split again after joining, but this time it will split on word and not characters (so keep the join also).
Final word
Since the issue has been solved I want to just explain what those things mean:
''.join(list)– This means taking the original list which is separated by every charachter and make a string out of it.string.split()– This means to make a list of it again (which fd.inc whatever that is apperently want one) but make this one separated by word and not charachter like the original list.Also, I would recommend you looking on some Python basics which will help you in the future 🙂 This is a great series of videos: http://www.youtube.com/watch?v=tKTZoB2Vjuk