A software is producing UTF-8 files, but writing content to the file that isn’t

Question

0

Asked: June 7, 20262026-06-07T00:12:57+00:00 2026-06-07T00:12:57+00:00

A software is producing UTF-8 files, but writing content to the file that isn’t

0

A software is producing UTF-8 files, but writing content to the file that isn’t unicode. I can’t change that software and have to take the output as it is now. Don’ t know if this will show up here correctly, but an german umlaut “ä” is shown in the file as “Ã¤”.

If I open the file in Notepad++, it tells me the file is UTF-8 (without BOM) encoded. Now, if I say “convert to ANSI” in Notepad and then switch the file encoding back to UTF-8 (without converting), the German umlauts in the file are correct. How can I achieve the exact same behaviour in Perl? Whatever I tried up to now, the umlaut mess just got worse.

To reproduce, create yourself an UTF-8 encoded file and write content to it:

Ok, I’ll try. Create yourself a UTF-8 file and write this to it:
MÃ¤nner SchÃ¼le VÃ¶ogel SÃ¼Ã

Then, on an UTF-8 mysql database, create a table with varchar field an UTF8_unicode encoding. Now, use this script:

use utf8;
use DBI;
use Encode;
if (open FILE, "test.csv") {
  my $db = DBI->connect(
    'DBI:mysql:your_db;host=127.0.0.1;mysql_compression=1', 'root', 'Yourpass',
    { PrintError => 1 }
  );
  my $sql="";
  my $sql = qq{SET NAMES 'utf8';};
  $db->do($sql);
  while (my $line = <FILE>) {
    my $sth = $db->prepare("INSERT IGNORE INTO testtable (testline) VALUES (?);");
    $sth->execute($line);
  }
}

The exact contents of file will get written to the database. But, the output I expect in database is with German umlauts:

Männer Schüler Vögel Süß

So, how can I convert that correctly?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T00:12:59+00:00

Sounds like something is converting it a second time, assuming it to be something like ISO 8859-15 and then converting that to UTF-8. You can reverse this by converting UTF-8 to ISO 8859-15 (or whichever encoding seems to make sense for your data).

As seen on http://www.fileformat.info/info/unicode/char/E4/index.htm the bytes 0xC3 0xA4 are the valid UTF-8 encoding of ä. When viewed as ISO 8859-15 (or 8859-1, or Windows-1252, or a number of other 8-bit encodings) they display the string Ã¤.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

A software is producing UTF-8 files, but writing content to the file that isn’t

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply