If the file revision 110 is a direct lineage of…

Question

0

Asked: May 13, 20262026-05-13T14:03:29+00:00 2026-05-13T14:03:29+00:00

I have a string in unicode and I need to return the first N

0

I have a string in unicode and I need to return the first N characters.
I am doing this:

result = unistring[:5]

but of course the length of unicode strings != length of characters.
Any ideas? The only solution is using re?

Edit: More info

unistring = "Μεταλλικα" #Metallica written in Greek letters
result = unistring[:1]

returns-> ?

I think that unicode strings are two bytes (char), that’s why this thing happens. If I do:

result = unistring[:2]

I get

M

which is correct,
So, should I always slice*2 or should I convert to something?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T14:03:29+00:00

Unfortunately for historical reasons prior to Python 3.0 there are two string types. byte strings (str) and Unicode strings (unicode).

Prior to the unification in Python 3.0 there are two ways to declare a string literal: unistring = "Μεταλλικα" which is a byte string and unistring = u"Μεταλλικα" which is a unicode string.

The reason you see ? when you do result = unistring[:1] is because some of the characters in your Unicode text cannot be correctly represented in the non-unicode string. You have probably seen this kind of problem if you ever used a really old email client and received emails from friends in countries like Greece for example.

So in Python 2.x if you need to handle Unicode you have to do it explicitly. Take a look at this introduction to dealing with Unicode in Python: Unicode HOWTO

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions