I was encoding emails to be used with an external website’s API using Python M2Crypto’s RSA with PKCS1 padding. When using unicode, the encoded emails returned no results from the API, but when I used str(unicode_email), I received the correct information.
I was under the impression that both unicode and byte representations of a string should have worked in this case. Does anyone know why the unicode fails?
Code for reference:
from M2Crypto import RSA
email = u'email@example.com' #fails
email = str(email) # succeeds
rsa = RSA.load_pub_key('rsa_pubkey.pem')
result = rsa.public_encrypt(email, RSA.pkcs1_padding).encode('base64')
The M2Crypto module deals exclusively with opaque bytes, which are values between 0 and 255, represented as the python
strtype.The Python 2.x
strtype consists of such bytes, but theunicodetype is a different beast altogether. You can easily convert between the two by using the.decode()method and it’s mirror method.encode().When you call
str()on aunicodeobject, it makes the conversion by applying the default encoding, in essence it callsemail.encode(sys.getdefaultencoding()). That’s fine for your all-ASCII email address, but you’re bound to run intoUnicodeEncodeErrorexceptions with anything else. Better stick to using the explicit methods only.Note that you probably have to set the encoding you used on the MIME headers of the email you send.
I strongly recommend you read up on the all this in the Python Unicode HOWTO.