Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8896045
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T23:56:34+00:00 2026-06-14T23:56:34+00:00

After some issues getting Chrome Compact Language Detection library installed on Windows, I installed

  • 0

After some issues getting Chrome Compact Language Detection library installed on Windows, I installed CLD from this easy_install.

I can now use CLD, but getting some encoding issues.

Background

Pulling Tweets into a python script, and after stripping out the hashtags and links, passing them to CLD to detect the language.
Following is a simplified version of my code:

s = "I am a tweet from Twitter"
clean_s = s.encode('utf-8')
lan = cld.detect(clean_s, pickSummaryLanguage=True, removeWeakMatches=True)

Problem

4 out of 5 times, this works as expected (get returned a response about what language it is).

However, I keep getting this error popping up:

UnicodeEncodeError: ‘charmap’ codec can’t encode character u’\u2019′
in position 15: character maps to undefined

I did read that:

“You must provide CLD clean (interchange-valid) UTF-8, so any encoding
issues must be sorted out before-hand.”

However, I thought I had this covered with my statement to encode to UTF8?

I assume that I need to ensure that I pass a string to CLD that preserves fonts in languages such as arabic, asian, etc.

This is my first python project, so likely this is a rookie mistake. Can anyone point out my mistake and how to rectify?

Let me know in comments if I need to gather more info, and I will edit my Q to provide more info.

EDIT
If it helps, here is my rookie code (cut down to replicate issue).
I am running Python 2.7 32bit.

Running this code, after awhile, I get this error.
Let me know if I have not correctly implemented the error reporting.

Raw: Traceback (most recent call last):
  File "LanguageTesting.py", line 71, in <module>
    parse_tweet(tweet)
  File "LanguageTesting.py", line 43, in parse_tweet
    print "Raw:", raw
  File "C:\Python27\ArcGIS10.1\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 29-32: character maps to <undefined>
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T23:56:36+00:00Added an answer on June 14, 2026 at 11:56 pm

    It looks like you are failing on the print statement right? This means Python cannot encode the unicode string into what it thinks the console’s stdout encoding is (“print sys.getdefaultencoding()”).

    If python is wrong about what your terminal expects, you can set the env var (“export PYTHONIOENCODING=UTF-8”) and it will encode your printed strings to utf-8. Alternatively, before printing, you can encode to whatever charset your terminal expects (and will likely have to ignore/replace errors to avoid exceptions like the one you hit)…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Originally, I was having some issues getting this code to function, but after a
I'm having some issues understanding how to reference new browser windows after opening them.
I am having some issues with logging. After reviewing JBoss Seam source code, I
I am having some issue with layout in pyqt. After closing the items from
After some great help from Josh Mein with a Javascript Hide/Show menu, the menu
After getting help on this question , I was led to do better debugging.
I was having some issues the other day with my interface lagging and after
I'm having some issues with getting a parent nodes id to display in entirety.
I'm having issues getting some scoping right with the below app. by the time
I'm getting a weird behaviour in my blog, only in Google Chrome. (This is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.