Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9233505
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T06:34:10+00:00 2026-06-18T06:34:10+00:00

Is there a mapping from utf8 to normalized non-accented letters in both latin-1 and

  • 0

Is there a mapping from utf8 to normalized non-accented letters in both latin-1 and utf8?

I have been getting errors such as:

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u010d' in position 4: ordinal not in range(256)

And I am solving each one of these error manually by doing the following code. Is there a better way to do this?:

def prehunpos(sentence):
    sentence = sentence.replace(u'\u2018',"'") # left single quote mark
    sentence = sentence.replace(u'\u2019',"'") # right single quote mark
    sentence = sentence.replace(u'\u201C','"') # left double quote mark
    sentence = sentence.replace(u'\u201D','"') # right double quote mark
    sentence = sentence.replace(u'\u2010',"-") # hyphen
    sentence = sentence.replace(u'\u2011',"-") # non-break hyphen
    sentence = sentence.replace(u'\u2012',"-") # figure dash
    sentence = sentence.replace(u'\u2013',"-") # dash
    sentence = sentence.replace(u'\u2014',"-") # some sorta dash
    sentence = sentence.replace(u'\u2015',"-") # long dash
    sentence = sentence.replace(u'\u2017',"_") # double underscore
    sentence = sentence.replace(u'\u2014',"-") # some sorta dash
    sentence = sentence.replace(u'\u2016',"|") # long dash
    sentence = sentence.replace(u'\u2024',"...") # ...
    sentence = sentence.replace(u'\u2025',"...") # ...
    sentence = sentence.replace(u'\u2026',"...") # ...
    sentence = sentence.replace("\xce\x9d\xce\x91\xce\xa4\xce\x9f",u'NATO') # NATO

    sentence = sentence.replace(u'\u0391',"A") # Greek Capital Alpha
    sentence = sentence.replace(u'\u0392',"B") # Greek Capital Beta
    #sentence = sentence.replace(u'\u0393',"") # Greek Capital Gamma
    #sentence = sentence.replace(u'\u0394',"") # Greek Capital Delta
    sentence = sentence.replace(u'\u0395',"E") # Greek Capital Epsilon
    sentence = sentence.replace(u'\u0396',"Z") # Greek Capital Zeta
    sentence = sentence.replace(u'\u0397',"H") # Greek Capital Eta
    #sentence = sentence.replace(u'\u0398',"") # Greek Capital Theta
    sentence = sentence.replace(u'\u0399',"I") # Greek Capital Iota
    sentence = sentence.replace(u'\u039a',"K") # Greek Capital Kappa
    #sentence = sentence.replace(u'\u039b',"") # Greek Capital Lambda
    sentence = sentence.replace(u'\u039c',"M") # Greek Capital Mu
    sentence = sentence.replace(u'\u039d',"N") # Greek Capital Nu
    #sentence = sentence.replace(u'\u039e',"") # Greek Capital Xi
    sentence = sentence.replace(u'\u039f',"O") # Greek Capital Omicron
    sentence = sentence.replace(u'\u03a1',"P") # Greek Capital Rho
    #sentence = sentence.replace(u'\u03a3',"") # Greek Capital Sigma
    sentence = sentence.replace(u'\u03a4',"T") # Greek Capital Tau
    sentence = sentence.replace(u'\u03a5',"Y") # Greek Capital Upsilon
    #ssentence = sentence.replace(u'\u03a6',"") # Greek Capital Phi
    sentence = sentence.replace(u'\u03a7',"T") # Greek Capital Chi
    #sentence = sentence.replace(u'\u03a8',"") # Greek Capital Psi
    #sentence = sentence.replace(u'\u03a9',"") # Greek Capital Omega

    sentence = sentence.replace(u'\u03b1',"a") # Greek small alpha
    sentence = sentence.replace(u'\u03b2',"b") # Greek small beta
    #sentence = sentence.replace(u'\u03b3',"") # Greek small gamma
    #sentence = sentence.replace(u'\u03b4',"") # Greek small delta
    sentence = sentence.replace(u'\u03b5',"e") # Greek small epsilon
    #sentence = sentence.replace(u'\u03b6',"") # Greek small zeta
    #sentence = sentence.replace(u'\u03b7',"") # Greek small eta
    #sentence = sentence.replace(u'\u03b8',"") # Greek small thetha
    sentence = sentence.replace(u'\u03b9',"i") # Greek small iota
    sentence = sentence.replace(u'\u03ba',"k") # Greek small kappa
    #sentence = sentence.replace(u'\u03bb',"") # Greek small lamda
    sentence = sentence.replace(u'\u03bc',"u") # Greek small mu
    sentence = sentence.replace(u'\u03bd',"v") # Greek small nu
    #sentence = sentence.replace(u'\u03be',"") # Greek small xi
    sentence = sentence.replace(u'\u03bf',"o") # Greek small omicron
    #sentence = sentence.replace(u'\u03c0',"") # Greek small pi
    sentence = sentence.replace(u'\u03c1',"p") # Greek small rho
    sentence = sentence.replace(u'\u03c2',"c") # Greek small final sigma
    #sentence = sentence.replace(u'\u03c3',"") # Greek small sigma
    sentence = sentence.replace(u'\u03c4',"t") # Greek small tau
    sentence = sentence.replace(u'\u03c5',"u") # Greek small upsilon
    #sentence = sentence.replace(u'\u03c6',"") # Greek small phi
    sentence = sentence.replace(u'\u03c7',"x") # Greek small chi
    sentence = sentence.replace(u'\u03c8',"x") # Greek small psi
    sentence = sentence.replace(u'\u03c9',"w") # Greek small omega


    sentence = sentence.replace(u'\u0103',"a") # Latin a with breve
    sentence = sentence.replace(u'\u0107',"c") # Latin c with acute
    sentence = sentence.replace(u'\u010d',"c") # Latin c with caron
    sentence = sentence.replace(u'\u0161',"s") # Lation s with caron

    return sentence.strip()
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T06:34:11+00:00Added an answer on June 18, 2026 at 6:34 am

    If you need a general means to transform non-Latin scripts into Latin, an ICU transform is the best choice. There is a Python wrapper for ICU, PyICU (http://pypi.python.org/pypi/PyICU). However, if you are only targeting a single script (looks like you are specifically interested in Greek?), a mapping table is the quickest solution. Although you could write it more concisely:

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    greek_to_latin = {u"Α": u"A", u"Β": u"B", u"Γ": u"G"}  # ...
    latin_string = "".join(greek_to_latin[c] for c in greek_string)
    

    You might also check out the unicodedata module which has a means to identify the category of a character, to identify non-ASCII punctuation symbols.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I read in a book on non-deterministic mapping there is mapping from Q*∑ to
There is a mapping exception for a particular entity. Cant figure out from where
I have a situation in which I'd like to maintain a mapping from one
Is there a way to get the hibernate mapping from within my application code?
In C# I have a use case where I have a mapping from int
I was wondering if there was any key mapping in Vim to allow me
Is there a way to separate out the domain objects and mapping files into
My question is: is there a way with a DataContext/Table mapping to implement some
For methods where ... there exists a static one-to-one mapping between the input and
So, I'm working on a mapping application. In the app there are these toolbars

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.