Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8699437
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T01:56:44+00:00 2026-06-13T01:56:44+00:00

I always work on Arabic text files and to avoid problems with encoding I

  • 0

I always work on Arabic text files and to avoid problems with encoding I transliterate Arabic characters into English according to Buckwalter’s scheme (http://www.qamus.org/transliteration.htm)

Here is my code to do so but it’s very SLOW even with small files like 400 kb. Ideas to make it faster?

Thanks

     def transliterate(file):
          data = open(file).read()
          buckArab = {"'":"ء", "|":"آ", "?":"أ", "&":"ؤ", "<":"إ", "}":"ئ", "A":"ا", "b":"ب", "p":"ة", "t":"ت", "v":"ث", "g":"ج", "H":"ح", "x":"خ", "d":"د", "*":"ذ", "r":"ر", "z":"ز", "s":"س", "$":"ش", "S":"ص", "D":"ض", "T":"ط", "Z":"ظ", "E":"ع", "G":"غ", "_":"ـ", "f":"ف", "q":"ق", "k":"ك", "l":"ل", "m":"م", "n":"ن", "h":"ه", "w":"و", "Y":"ى", "y":"ي", "F":"ً", "N":"ٌ", "K":"ٍ", "~":"ّ", "o":"ْ", "u":"ُ", "a":"َ", "i":"ِ"}    
          for char in data: 
               for k, v in arabBuck.iteritems():
                     data = data.replace(k,v)                 
      return data
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T01:56:45+00:00Added an answer on June 13, 2026 at 1:56 am

    Edit Oct 2021

    There was a python package recently released that does this (and a lot more), so anyone reading this post now should ignore all the other answers and just use Camel Tools. (Nizar Habash and his team at NYU Abu Dhabi are awesome for developing this and making it so accessible!)

    ::python
    from camel_tools.utils.charmap import CharMapper
    sentence = "ذهبت إلى المكتبة."
    print(sentence)
    
    ar2bw = CharMapper.builtin_mapper('ar2bw')
    
    sent_bw = ar2bw(sentence)
    print(sent_bw)
    

    Output:

    هبت إلى المكتبة.
    *hbt <lY Almktbp.
    

    You can find install instructions and tutorials here: https://github.com/CAMeL-Lab/camel_tools


    Old answer
    Incidentally, someone already wrote a script that does this, so you might want to check that out before spending too much time on your own:
    buckwalter2unicode.py

    It probably does more than what you need, but you don’t have to use all of it: I copied just the two dictionaries and the transliterateString function (with a few tweaks, I think), and use that on my site.

    Edit:
    The script above is what I have been using, but I’m just discovered that it is much slower than using replace, especially for a large corpus. This is the code I finally ended up with, that seems to be simpler and faster (this references a dictionary buck2uni):

    def transString(string, reverse=0):
        '''Given a Unicode string, transliterate into Buckwalter. To go from
        Buckwalter back to Unicode, set reverse=1'''
    
        for k, v in buck2uni.items():
            if not reverse:
                string = string.replace(v, k)
            else:
                string = string.replace(k, v)
    
        return string
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I guess my main question is, will this always work as long as I
I've noticed that when importing JUnit, the * wildcard doesn't always work. e.g. for
I did the following but it not work for always.It works if i launch
When it comes to web development I have always tried to work SMART not
So I have successfully gotten AJAX requests to work before but I have always
Is the following always acceptable code? It seems to work, but does it consistently
I've always used single quotes when writing my HTML by hand. I work with
DataAnnotations does not work with buddy class. The following code always validate true. Why
Although I'm not a .NET developer I always get excited about the work DLR
I am trying to use price.facet.range , but it doesn't work, it always returns

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.