Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 773977
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T19:03:15+00:00 2026-05-14T19:03:15+00:00

The printf/fprintf/sprintf family supports a width field in its format specifier. I have a

  • 0

The printf/fprintf/sprintf family supports
a width field in its format specifier. I have a doubt
for the case of (non-wide) char arrays arguments:

Is the width field supposed to mean bytes or characters?

What is the (correct-de facto) behaviour if the char array
corresponds to (say) a raw UTF-8 string?
(I know that normally I should use some wide char type,
that’s not the point)

For example, in

char s[] = "ni\xc3\xb1o";  // utf8 encoded "niño"
fprintf(f,"%5s",s);

Is that function supposed to try to ouput just 5 bytes
(plain C chars) (and you take responsability of misalignments
or other problems if two bytes results in a textual characters) ?

Or is it supposed to try to compute the length of “textual characters”
of the array? (decodifying it… according to the current locale?)
(in the example, this would amount to find out that the string has
4 unicode chars, so it would add a space for padding).

UPDATE: I agree with the answers, it is logical that the printf family doesnt
distinguish plain C chars from bytes. The problem is my glibc doest not seem
to fully respect this notion, if the locale has been set previously, and if
one has the (today most used) LANG/LC_CTYPE=en_US.UTF-8

Case in point:

#include<stdio.h>
#include<locale.h>
main () {
        char * locale = setlocale(LC_ALL, ""); /* I have LC_CTYPE="en_US.UTF-8" */
        char s[] = {'n','i', 0xc3,0xb1,'o',0}; /* "niño" in utf8: 5 bytes, 4 unicode chars */
        printf("|%*s|\n",6,s); /* this should pad a blank - works ok*/
        printf("|%.*s|\n",4,s); /* this should eat a char - works ok */
        char s3[] = {'A',0xb1,'B',0}; /* this is not valid UTF8 */
        printf("|%s|\n",s3);     /* print raw chars - ok */
        printf("|%.*s|\n",15,s3);     /* panics (why???) */
}

So, even when a non-POSIX-C locale has been set, still printf seems to have the right notion for counting width: bytes (c plain chars) and not unicode chars. That’s fine. However, when given a char array that is not decodable in his locale, it silently panics (it aborts – nothing is printed after the first ‘|’ – without error messages)… only if it needs to count some width. I dont understand why it even tries to decode the string from utf-8, when it doesn need/have to. Is this a bug in glibc ?

Tested with glibc 2.11.1 (Fedora 12) (also glibc 2.3.6)

Note: it’s not related to terminal display issues – you can check the output by piping to od : $ ./a.out | od -t cx1 Here’s my output:

0000000   |       n   i 303 261   o   |  \n   |   n   i 303 261   |  \n
         7c  20  6e  69  c3  b1  6f  7c  0a  7c  6e  69  c3  b1  7c  0a
0000020   |   A 261   B   |  \n   |
         7c  41  b1  42  7c  0a  7c

UPDATE 2 (May 2015): This questionable behaviour has been fixed in newer versions of glibc (from 2.17, it seems). With glibc-2.17-21.fc19 it works ok for me.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T19:03:15+00:00Added an answer on May 14, 2026 at 7:03 pm

    It will result in five bytes being output. And five chars. In ISO C, there is no distinction between chars and bytes. Bytes are not necessarily 8 bits, instead being defined as the width of a char.

    The ISO term for an 8-bit value is an octet.

    Your “niño” string is actually five characters wide in terms of the C environment (sans the null terminator, of course). If only four symbols show up on your terminal, that’s almost certainly a function of the terminal, not C’s output functions.

    I’m not saying a C implementation couldn’t handle Unicode. It could quite easily do UTF-32 if CHAR_BITS was defined as 32. UTF-8 would be harder since it’s a variable length encoding but there are ways around almost any problem 🙂


    Based on your update, it seems like you might have a problem. However, I’m not seeing your described behaviour in my setup with the same locale settings. In my case, I’m getting the same output in those last two printf statements.

    If your setup is just stopping output after the first | (I assume that’s what you mean by abort but, if you meant the whole program aborts, that’s much more serious), I would raise the issue with GNU (try your particular distributions bug procedures first). You’ve done all the important work such as producing a minimal test case so someone should even be happy to run that against the latest version if your distribution doesn’t quite get there (most don’t).


    As an aside, I’m not sure what you meant by checking the od output. On my system, I get:

    pax> ./qq | od -t cx1
    0000000   |       n   i 303 261   o   |  \n   |   n   i 303 261   |  \n
             7c  20  6e  69  c3  b1  6f  7c  0a  7c  6e  69  c3  b1  7c  0a
    0000020   |   A 261   B   |  \n   |   A 261   B   |  \n
             7c  41  b1  42  7c  0a  7c  41  b1  42  7c  0a
    0000034
    

    so you can see the output stream contains the UTF-8, meaning that it’s the terminal program which must interpret this. C/glibc isn’t modifying the output at all, so maybe I just misunderstood what you were trying to say.

    Although I’ve just realised you may be saying that your od output has only the starting bar on that line as well (unlike mine which appears to not have the problem), meaning that it is something wrong within C/glibc, not something wrong with the terminal silently dropping the characters (in all honesty, I would expect the terminal to drop either the whole line or just the offending character (i.e., output |A) – the fact that you’re just getting | seems to preclude a terminal problem). Please clarify that.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.