Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8564705
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T17:15:32+00:00 2026-06-11T17:15:32+00:00

I have some UTF-8 strings in memory (this is part of a bigger system)

  • 0

I have some UTF-8 strings in memory (this is part of a bigger system) which are basically name of places in European countries. What I’m trying to do is write them to a text file. I’m on my Linux machine (Fedora). So when I write these name strings (char pointers) to file, the file is getting saved in extended ASCII format.

Now I copy this file to my Windows machine where I need to load these names to mySQL DB. When I open the text file on notepad++, again it defaults the encoding to ANSI. But I can select encoding to UTF-8 and almost all the characters looks as expected except the following 3 characters:- Ő, ő and ű. They are displayed within the text as &#336, &#337 and &#369.

Does anyone has any thought on what might be wrong. I know that these are not part of extended ASCII symbols. But the way I’m writing this to the file is something like:

// create out file stream
std::ofstream fs("sample.txt");

// loop through utf-8 formatted string list
if(fs.is_open()) {
    for(int i = 0; i < num_strs; i++) {
        fs << str_name; // unsigned char pointer representing name in utf-8 format
        fs << "\n";
    }
}
fs.close();

Everything looks good even with characters like ú and ö and ß. The issue is with the above 3 characters alone. Any thoughts/suggestions/comments on this? Thanks!

As an example, a string like “Gyömrő” shows up as “Gyömr&#369”.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T17:15:33+00:00Added an answer on June 11, 2026 at 5:15 pm

    You need to identify at which stage the unexpected &#336 HTML entities are introduced. My best guess is, that they are already in the string you are writing to the file. Use a debugger or add testing code that counts the &s in the string.

    That means, your source of information does not strictly use UTF-8 for non-ASCII characters, but occasionally uses HTML entities. This is odd, but possible if your data source is a HTML file (or something like that).

    Also, you might want to look at your output file in HEX mode. (There’s a nice plugin for Notepad++) This might hopefully help you to understand what UTF-8 really means on the byte level: The 128 ASCII symbols use one byte of a value 0-127. Other symbols use 2-6 bytes (i think), where the first byte must be >127. HTML entities are not really an encoding, more an escape sequence like ‘\n’ ‘\r’.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a file which is mostly UTF-8, but some Windows-1252 characters have also
I've read in some places that JavaScript strings are UTF-16, and in other places
I have a database with lots of strings. Some of them are correctly UTF-8
I have a Localized.strings (UTF-16) file that contains some strings that I use to
I have a list with some strings (most of which I fetched from a
I have some problem to get the utf-8 string from PHP using jQuery $.load()
I am experiencing some problem on UTF-8 Encoding. I have a CSV file and
I have a file in UTF-8, where some lines contain the U+2028 Line Separator
I have some tables roughly like so: Client: id name Employee id name Email
I have some strings that are valid in my database but when I include

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.