Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7416989
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T07:36:34+00:00 2026-05-29T07:36:34+00:00

I am trying to parse an RSS feed with feedparser and insert it into

  • 0

I am trying to parse an RSS feed with feedparser and insert it into a mySQL table using SQLAlchemy. I was actually able to get this running just fine but today the feed had an item with an ellipsis character in the description and I get the following error:

UnicodeEncodeError: ‘latin-1′ codec can’t encode character u’\u2026’ in position 35: ordinal not in range(256)

If I add the convert_unicode=True option to the engine I am able to get the insert to go through but the ellipsis doesn’t show up it’s just weird characters. This seems to make sense since to the best of my knowledge there is no horizontal ellipsis in latin-1. Even if I set the encoding to utf-8 it doesn’t seem to make a difference. If I do an insert using phpmyadmin and include the ellipsis it goes through fine.

I’m thinking I just don’t understand character encodings or how to get SQLAlchemy to use one I specify. Does anyone know how to get the text to go in without weird characters?

UPDATE

I think I have figured this one out but I’m not really sure why it matters…

Here is the code:

import sys
import feedparser
import sqlalchemy
from sqlalchemy import create_engine, MetaData, Table

COMMON_CHANNEL_PROPERTIES = [
  ('Channel title:','title', None),
  ('Channel description:', 'description', 100),
  ('Channel URL:', 'link', None),
]

COMMON_ITEM_PROPERTIES = [
  ('Item title:', 'title', None),
  ('Item description:', 'description', 100),
  ('Item URL:', 'link', None),
]

INDENT = u' '*4

def feedinfo(url, output=sys.stdout):
  feed_data = feedparser.parse(url)
  channel, items = feed_data.feed, feed_data.entries

  #adding charset=utf8 here is what fixed the problem

  db = create_engine('mysql://user:pass@localhost/db?charset=utf8')
  metadata = MetaData(db)
  rssItems = Table('rss_items', metadata,autoload=True)
  i = rssItems.insert();

  for label, prop, trunc in COMMON_CHANNEL_PROPERTIES:
    value = channel[prop]
    if trunc:
      value = value[:trunc] + u'...'
    print >> output, label, value
  print >> output
  print >> output, "Feed items:"
  for item in items:
    i.execute({'title':item['title'], 'description': item['description'][:100]})
    for label, prop, trunc in COMMON_ITEM_PROPERTIES:
      value = item[prop]
      if trunc:
        value = value[:trunc] + u'...'
      print >> output, INDENT, label, value
    print >> output, INDENT, u'---'
  return

if __name__=="__main__":
  url = sys.argv[1]
  feedinfo(url)

Here’s the output/traceback from running the code without the charset option:

Channel title: [H]ardOCP News/Article Feed
Channel description: News/Article Feed for [H]ardOCP...
Channel URL: http://www.hardocp.com

Feed items:
     Item title: Windows 8 UI is Dropping the 'Start' Button
     Item description: After 15 years of occupying a place of honor on the desktop, the "Start" button will disappear from ...
     Item URL: http://www.hardocp.com/news/2012/02/05/windows_8_ui_dropping_lsquostartrsquo_button/
     ---
     Item title: Which Crashes More? Apple Apps or Android Apps
     Item description: A new study of smartphone apps between Android and Apple conducted over a two month period came up w...
     Item URL: http://www.hardocp.com/news/2012/02/05/which_crashes_more63_apple_apps_or_android/
     ---
Traceback (most recent call last):
  File "parse.py", line 47, in <module>
    feedinfo(url)
  File "parse.py", line 36, in feedinfo
    i.execute({'title':item['title'], 'description': item['description'][:100]})
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2758, in execute
    return e._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2304, in _execute_clauseelement
    return connection._execute_clauseelement(elem, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1538, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 330, in do_execute
    cursor.execute(statement, parameters)
  File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 159, in execute
  File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 264, in literal
  File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 202, in unicode_literal
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in position 35: ordinal not in range(256)

So it looks like adding the charset to the mysql connect string did it. I suppose it defaults to latin-1? I had tried setting the encoding flag on content_engine to utf8 and that did nothing. Anyone know why it would use latin-1 when the tables and fields are set to utf8 unicode? I also tried encoding item[‘description] using .encode(‘cp1252’) before sending it off and that worked as well even without adding the charset option to the connection string. That shouldn’t have worked with latin-1 but apparently it did? I’ve got the solution but would love an answer 🙂

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T07:36:34+00:00Added an answer on May 29, 2026 at 7:36 am

    The error message

    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' 
    in position 35: ordinal not in range(256)
    

    seems to indicate that some Python language code is trying to convert the character \u2026 into a Latin-1 (ISO8859-1) string, and it is failing. Not surprising, that character is U+2026 HORIZONTAL ELLIPSIS, which has no single equivalent character in ISO8859-1.

    You fixed the problem by adding the query ?charset=utf8 in your SQLAlchemy connection call:

    import sqlalchemy
    from sqlalchemy import create_engine, MetaData, Table
    
    db = create_engine('mysql://user:pass@localhost/db?charset=utf8')
    

    The section Database Urls of the SQLAlchemy documentation tells us that a URL beginning with mysql indicates a MySQL dialect, using the mysql-python driver.

    The following section, Custom DBAPI connect() arguments, tells us that query arguments are passed to the underlying DBAPI.

    So, what does the mysql-python driver make of a parameter {charset: 'utf8'}? Section Functions and attributes of their documentation says of the charset attribute “…If present, the connection character set will be changed to this character set, if they are not equal.”

    To find out what the connection character set means, we turn to 10.1.4. Connection Character Sets and Collations of the MySQL 5.6 reference manual. To make a long story short, MySQL can have interpret incoming queries as an encoding different than the database’s character set, and different than the encoding of the returned query results.

    Since the error message you reported looks like a Python rather than a SQL error message, I’ll speculate that something in SQLAlchemy or mysql-python is attempting to convert the query to a default connection encoding of latin-1 before sending it. This is what triggers the error. However, the query string ?charset=utf8 in your connect() call changes the connection encoding, and the U+2026 HORIZONTAL ELLIPSIS is able to get through.

    Update: you also ask, “if I remove the charset option and then encode the description using .encode(‘cp1252’) it will go through just fine. How is an ellipsis able to get through with cp1252 but not unicode?”

    The encoding cp1252 has a horizontal ellipsis character at byte value \x85. Thus it is possible to encode a Unicode string containing U+2026 HORIZONTAL ELLIPSIS into cp1252 without error.

    Remember also that in Python, Unicode strings and byte strings are two different data types. It’s reasonable to speculate that MySQLdb might have a policy of sending only byte strings over a SQL connection. Thus it would encode a query received as a Unicode string into a byte string, but would leave a query received as a byte string alone. (This is speculation, I haven’t looked at the source code.)

    In the traceback you posted, the last two lines (closest to where the error occur) show the method names literal, followed by unicode_literal. That tends to support the theory that MySQLdb is encoding the query it receives as a Unicode string into a byte string.

    When you encode the query string yourself, you bypass the part of MySQLdb that does this encoding differently. Note, however, that if you encode the query string differently than the MySQL connection charset calls for, then you’ll have an encoding mismatch, and your text will likely be stored wrong.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to parse an RSS feed using LINQ to Xml This is the
I am trying to parse an RSS feed using Linq to XML like so:
I'm trying to parse a rss feed that looks like this for the attribute
I am trying to parse an rss feed that is using the well formed
I am trying to parse an RSS feed with feedparser. The below code snipped
I've been trying to grab the Digg rss feed and parse its contents into
I'm trying to parse the title tag in an RSS 2.0 feed into three
i am trying to parse usernames on a twitter rss feed using simplexml in
I'm trying to parse a RSS feed using C#, and I need to transform
I'm trying to parse an RSS/Podcast feed using Beautifulsoup and everything is working nicely

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.