I’m trying to make a Bison parser to handle UTF-8 characters. I don’t want

Question

0

Asked: May 11, 20262026-05-11T18:58:10+00:00 2026-05-11T18:58:10+00:00

I’m trying to make a Bison parser to handle UTF-8 characters. I don’t want

0

I’m trying to make a Bison parser to handle UTF-8 characters. I don’t want the parser to actually interpret the Unicode character values, but I want it to parse the UTF-8 string as a sequence of bytes.

Right now, Bison generates the following code which is problematic:

  if (yychar <= YYEOF)
    {
      yychar = yytoken = YYEOF;
      YYDPRINTF ((stderr, "Now at end of input.\n"));
    }

The problem is that many bytes of the UTF-8 string will have a negative value, and Bison interprets negative values as an EOF, and stops.

Is there a way around this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T18:58:11+00:00

Editorial Team

2026-05-11T18:58:11+00:00Added an answer on May 11, 2026 at 6:58 pm

bison yes, flex no. The one time I needed a bison parser to work with UTF-8 encoded files I ended up writing my own yylex function.

edit: To help, I used a lot of the Unicode operations available in glib (there’s a gunicode type and some file/string manipulation functions that I found useful).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to make a Bison parser to handle UTF-8 characters. I don’t want

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply