here is an array of Unicode words used in the python script. texts =[uabc,

Question

0

Asked: May 22, 20262026-05-22T00:48:48+00:00 2026-05-22T00:48:48+00:00

here is an array of Unicode words used in the python script. texts =[uabc,

0

here is an array of Unicode words used in the python script.

texts =[u"abc", u"pqr", u"mnp"]

The script is working as expected with the above 3 words example. The issue is that there are thousands of words in a text file.
How do I read from the text file?

Update:
I have 2 issues. The sequence of words from the text file is not maintained in the output.
The text file has unicode characters and hence the “u” in my original example.

# cat testfile.txt
Testing this file with Python

# cat test.py
#!/usr/bin/python
# -*- coding: utf-8 -*-

f     = open('testfile.txt', 'r')
texts  = set(f.read().split())
print (texts)

# python test.py
set(['this', 'Python', 'Testing', 'with', 'file'])

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T00:48:48+00:00

I see no problem with your file reading code. Given that the words appear in the file separated by whitespace, and the file is not too big to be gulped with a single read, it should work just fine. The real problem is the order of the words if you shove them into a set.

If you need the words in the same order as they appear in the file, why are you using a set? Just keep them in a list.

If you need a set to remove duplicates and/or other purposes, then you have the following options:

Use the OrderedDict class – standard in Python since 2.7, and recipes exist online for earlier versions.
Create an ordered set – here’s a SO question with a good discussion of this

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

here is an array of Unicode words used in the python script. texts =[uabc,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply