Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing

Question

0

Asked: May 19, 20262026-05-19T02:06:05+00:00 2026-05-19T02:06:05+00:00

Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing

0

Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing them (for example, having é become \'e, and œ become \oe). I’m incorporating this into a Python code. This should rely on a translation table, and I have come up with the following code, which is simple and seems to work nicely:

accents = [
    [ u"à", "\\`a"],
    [ u"é", "\\'e"]
  ]
translation_table = dict([(ord(k), unicode(v)) for k, v in accents])
print u"été à l'eau".translate(translation_table)

But, writing a rather complete translation table will take me a long time, and Google didn’t help much. Does someone have such a thing ready, or know where to find one?

PS: I’m new to Python, so I welcome comments on the code above, of course.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T02:06:05+00:00

Download the Unicode Character Database (about 1MB). you can find a relational table for equivalent character combination for example é = \u00E9 is e+ ́ that is equivalent to \u0065+\u0301 (LATIN SMALL LETTER E+COMBINING ACUTE ACCENT). you can write simple codes for converting all combinational characters of all scripts or just them you want (you can control by script field in database).

Then replace the combinations with LaTeX code. for example use regular expression \w\u0065 to replace diactrics :\'<the_letter>. (I’m not sure about syntax. It depends on your programming language and regular expression engine.)

EDIT:
If you are using python, you have already the database and an implementation of a handler to use it. just like mentioned in below comment, import unicodedata.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given a Unicode string, I want to replace non-ASCII characters by LaTeX code producing

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply