Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing them (for example, having é become \'e, and œ become \oe). I’m incorporating this into a Python code. This should rely on a translation table, and I have come up with the following code, which is simple and seems to work nicely:
accents = [
[ u"à", "\\`a"],
[ u"é", "\\'e"]
]
translation_table = dict([(ord(k), unicode(v)) for k, v in accents])
print u"été à l'eau".translate(translation_table)
But, writing a rather complete translation table will take me a long time, and Google didn’t help much. Does someone have such a thing ready, or know where to find one?
PS: I’m new to Python, so I welcome comments on the code above, of course.
Download the Unicode Character Database (about 1MB). you can find a relational table for equivalent character combination for example
é = \u00E9ise+ ́that is equivalent to\u0065+\u0301 (LATIN SMALL LETTER E+COMBINING ACUTE ACCENT). you can write simple codes for converting all combinational characters of all scripts or just them you want (you can control by script field in database).Then replace the combinations with LaTeX code. for example use regular expression
\w\u0065to replace diactrics :\'<the_letter>. (I’m not sure about syntax. It depends on your programming language and regular expression engine.)EDIT:
If you are using python, you have already the database and an implementation of a handler to use it. just like mentioned in below comment,
import unicodedata.