I have a datafile containing a large number of sentences, encoded like this:
“Gib mir bitte Erk\u00e4ltung”
I also have a datafile containing a large number of keywords, encoded like this:
“Erkältung”
I would like to search for keywords in sentences and then write them out to a file, in the “Erkältung” format.
How would I convert \u00e4 to ä without having to do:
String.replace(‘\u00e4’, ‘ä’)
More exactly, I would like to have this return a match in Python 2.6:
(#coding: utf-8)
sentence = "Gib mir bitte Erk\u00e4ltung"
keyword = "Erkältung"
re.search(keyword, line)
Any hints?
Python has some handy character encoding conversions built in. In this case
unicode_escapeis what you want. When you read in your sentence, convert it as follows prior to doing your search: