In this output, why am I getting extra newlines after printing non-ASCII Unicode characters?

Question

0

Editorial Team

Asked: May 20, 20262026-05-20T07:49:06+00:00 2026-05-20T07:49:06+00:00

In this output, why am I getting extra newlines after printing non-ASCII Unicode characters?

0

In this output, why am I getting extra newlines after printing non-ASCII Unicode characters?

Platform is Windows Vista and problem occurs after chcp 65001 but not after chcp 850

C:\>chcp 850
Active code page: 850

C:\>perl unicode_bug_1.pl
Budweiser
Budweiser
Budweiser
Bud─øjovick├¢ Budvar
Bud─øjovick├¢ Budvar
Bud─øjovick├¢ Budvar

C:\>chcp 65001
Active code page: 65001

C:\>perl unicode_bug_1.pl
Budweiser
Budweiser
Budweiser
Budějovický Budvar

Budějovický Budvar

Budějovický Budvar

from this program

#!perl
use strict;
use warnings;

binmode (STDOUT, "encoding(UTF-8)"); # so no "Wide character in print" warning

print "Budweiser\n" for 1..3;
print "Bud\N{U+011B}jovick\N{U+00FD} Budvar\n" for 1..3;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T07:49:06+00:00

This seems to be a bug in Perl. I had thought it was a bug in Windows code page 65001 not really being supported for the console but I finally made test programs in C and Perl and the problem does not happen in the C version. It happens no matter where the Unicode character occurs in the line but the line you’re printing must be wider than the console supports.

Here is my C program:

#include "stdafx.h"

#include "Windows.h"


int _tmain(int argc, _TCHAR* argv[])
{
    BOOL b = SetConsoleOutputCP(65001);
    printf("set console output codepage returned %d\n", b);

    printf("cαfe\n");
    printf("1234567890 café\n");
    printf("1234567890 1234567890 cαfe\n");
    printf("1234567890 1234567890 1234567890 café\n");
    printf("1234567890 1234567890 1234567890 1234567890 cαfe\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n");
    printf("1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n");

    return 0;
}

And here is my Perl program:

#

use utf8;

binmode STDOUT, ':utf8';

printf STDOUT "cαfe\n";
printf STDOUT "1234567890 café\n";
printf STDOUT "1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 café\n";
printf STDOUT "1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 1234567890 cαfe\n";

UPDATE

No I was wrong, with the help of some of the guys at #perl on irc.perl.org it turns out to be a bug in the Microsoft API. WriteFile is documented to return the number of bytes written but returns the number of characters written, which depends on the codepage. A bug was filed in March 2010.

There is more discussion in the MSDN forums.

UPDATE 2

I posted Michael Kaplan’s blog, “Sorting it all out”, about this problem and he responded with the article entitled “Hidden in plain site: a purloined letter kind of a bug report”. He’s a Microsoft internationalization expert so you will surely find some insights there…

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In this output, why am I getting extra newlines after printing non-ASCII Unicode characters?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply