Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 726493
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T06:27:30+00:00 2026-05-14T06:27:30+00:00

I have an old C# program that is being ported to Python 3 for

  • 0

I have an old C# program that is being ported to Python 3 for different reasons. Basically, what the program does is to fetch a website and search its content (and process it, but that is not really relevant). I have never really had any issues with the actual fetch-and-search routine, but once I ported it to Python it started complaining about invalid unicode at certain locations.

This is not really a problem since the actual source webpage-data is the same as in the old C# application and the old program achieved its goal with the broken data. However, what I want is the Python 3 decode() method to behave as similar to the internal handling of such cases in C#. Unfortunately, after reading the Python manual and looking into the ‘ignore’ and ‘replace” error handling methods I really don’t get which is better to best mimic the C# behavior (which I also have failed to identify).

To add some code into the discussion, here is the C# code that handles everything transparently:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
response = (HttpWebResponse)request.GetResponse();
reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string html = reader.ReadToEnd();

The corresponding Python 3 code is as follows:

req = Request(url)
r = urlopen(req)
data = r.read().decode("utf_8")

However, I want to find out which of the following pieces of code that will best mimic the unicode behavior of the C# code:

data = r.read().decode("utf_8", "replace")

or

data = r.read().decode("utf_8", "ignore")

Anyone with in-depth unicode experience which can give me some pointers on which method is better? The Python manual does describe the behavior, but not so that I understand which I should use…

Thanks in advance for any help!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T06:27:30+00:00Added an answer on May 14, 2026 at 6:27 am

    According to http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8.aspx the default UTF-8 decoder of C# ignores invalid bytes.

    Python’s ‘ignore’ option for decoding unicode is the same as this.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some queries being run in a java program that makes use of
I have a old school c program that now and then need to tell
I have an old C++ program that is writing files and FTPing them to
I have a fairly old unmanaged C++ program that called managed c# code (vs2003)
I have rewritten an old program and designed a new database for it. I
I have an old, third party, command line, proprietary program which I'm calling from
I have old text inside a div. I want that text to fly out
I have old code that uses size_t which IIRC comes from cstring.h. On OS
I have refereed old questions and found that people face many issues after installing
We have an old asp application that instantiates a .NET com visible class. In

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.