As I’m French, I’m trying to make a little function that can add the good definite article before a country name. I have no problem except for the few countries that start with a diacritic. Here’s my code :
#!/usr/bin/env python
# -*- coding: utf-8 -*-
def article(nomPays):
voyelles = ['A','E','É','I','O','U','Y']
if nomPays == 'Mexique':
return 'du'
elif nomPays[0] in voyelles:
return 'de l\''
elif nomPays[-1] == 'e':#signe négatif pour compter à partir de la dernière lettre
return 'de la'
else:
return 'du'
print article('Érythrée')
If I enter Allemagne instead of Érythrée, the behaviour is correct : it returns ‘de l”. But Érythrée returns ‘de la’. It means my function doesn’t recognize the character É as part of the voyelles list.
Can anyone explain me why and how I can resolve this?
The problem is that you’re using
strin Python 2, wherestris a sequence of bytes and sonomPays[0]will give the first byte of the string, not the first character. In single-byte encodings this isn’t a problem, but with multi-byte encodings like UTF-8 the first byte of “Érythrée” is a lead byte and not the whole character “É”.You need to change to use
unicodeto grab the first character:Actually, it’d probably be easier to use
startswith:Alternatively you could use
unicodethroughout your application, or switch to Python 3, where all this is handled much better.