here is an array of Unicode words used in the python script.
texts =[u"abc", u"pqr", u"mnp"]
The script is working as expected with the above 3 words example. The issue is that there are thousands of words in a text file.
How do I read from the text file?
Update:
I have 2 issues. The sequence of words from the text file is not maintained in the output.
The text file has unicode characters and hence the “u” in my original example.
# cat testfile.txt
Testing this file with Python
# cat test.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
f = open('testfile.txt', 'r')
texts = set(f.read().split())
print (texts)
# python test.py
set(['this', 'Python', 'Testing', 'with', 'file'])
I see no problem with your file reading code. Given that the words appear in the file separated by whitespace, and the file is not too big to be gulped with a single
read, it should work just fine. The real problem is the order of the words if you shove them into aset.If you need the words in the same order as they appear in the file, why are you using a
set? Just keep them in a list.If you need a
setto remove duplicates and/or other purposes, then you have the following options:OrderedDictclass – standard in Python since 2.7, and recipes exist online for earlier versions.