Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3985388
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T05:53:04+00:00 2026-05-20T05:53:04+00:00

I am writing a parser for delphi’s dfm’s files. The lexer looks like this:

  • 0

I am writing a parser for delphi’s dfm’s files. The lexer looks like this:

EXP ([Ee][-+]?[0-9]+)

%%

("#"([0-9]{1,5}|"$"[0-9a-fA-F]{1,6})|"'"([^']|'')*"'")+ { 
                                                 return tkStringLiteral; }
"object" { return tkObjectBegin; }
"end" { return tkObjectEnd; }
"true" { /*yyval.boolean = true;*/ return tkBoolean; }
"false" { /*yyval.boolean = false;*/ return tkBoolean; }

"+" | "." | "(" | ")" | "[" | "]" | "{" | "}" | "<" | ">" | "=" | "," | 
":" { return yytext[0]; }

[+-]?[0-9]{1,10} { /*yyval.integer = atoi(yytext);*/ return tkInteger; }
[0-9A-F]+ { return tkHexValue; }
[+-]?[0-9]+"."[0-9]+{EXP}? { /*yyval.real = atof(yytext);*/ return tkReal; }
[a-zA-Z_][0-9A-Z_]* { return tkIdentifier; }
"$"[0-9A-F]+ { /* yyval.integer = atoi(yytext);*/ return tkHexNumber; }

[ \t\r\n] { /* ignore whitespace */ }
. { std::cerr << boost::format("Mystery character %c\n") % *yytext; }

<<EOF>> { yyterminate(); }

%%

and the bison grammar looks like

%token tkInteger
%token tkReal
%token tkIdentifier
%token tkHexValue
%token tkHexNumber
%token tkObjectBegin
%token tkObjectEnd
%token tkBoolean
%token tkStringLiteral

%%object:
    tkObjectBegin tkIdentifier ':' tkIdentifier 
          property_assignment_list tkObjectEnd
  ;

property_assignment_list:
    property_assignment
  | property_assignment_list property_assignment
  ;

property_assignment:
    property '=' value
  | object
  ;

property:
    tkIdentifier
  | property '.' tkIdentifier
  ;

value:
    atomic_value
  | set
  | binary_data
  | strings
  | collection
  ;

atomic_value:
    tkInteger
  | tkReal
  | tkIdentifier
  | tkBoolean
  | tkHexNumber
  | long_string
  ;

long_string:
    tkStringLiteral
  | long_string '+' tkStringLiteral
  ;

atomic_value_list:
    atomic_value
  | atomic_value_list ',' atomic_value
  ;

set:
    '[' ']'
  | '[' atomic_value_list ']'
  ;

binary_data:
    '{' '}'
  | '{' hexa_lines '}'
  ;

hexa_lines:
    tkHexValue
  | hexa_lines tkHexValue
  ;

strings:
    '(' ')'
  | '(' string_list ')'
  ;

string_list:
    tkStringLiteral
  | string_list tkStringLiteral
  ;

collection:
    '<' '>'
  | '<' collection_item_list '>'
  ;

collection_item_list:
    collection_item
  | collection_item_list collection_item
  ;

collection_item:
    tkIdentifier property_assignment_list tkObjectEnd
  ;

%%

void yyerror(const char *s, ...) {...}

The problem with this grammar occurs while parsing the binary data. Binary data in the dfm’s files is nothing
but a sequence of hexadecimal characters which never spans more than 80 characters per line. An example of
it is:

Picture.Data = {
      055449636F6E0000010001002020000001000800A80800001600000028000000
      2000000040000000010008000000000000000000000000000000000000000000

      ...

      FF00000000000000000000000000000000000000000000000000000000000000
      00000000FF000000FF000000FF00000000000000000000000000000000000000
      00000000}

As you can see, this element lacks any markers, so the strings clashes with other elements. In the example
above the first line is returns the proper token tkHexValue. The second however returns a tkInteger token
and the third a tkIdentifier token. So when the parsing comes, it fails with an syntax error because
binary data is composed only of tkHexValue tokens.

My first workaround was to require integers to have a maximum length (which helped in all but the last line
of the binary data). And the second was to move the tkHexValue token above the tkIdentifier but it means
that now I will not have identifiers like F0

I was wondering if there is any way to fix this grammar?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T05:53:05+00:00Added an answer on May 20, 2026 at 5:53 am

    Ok, I solved this one. I needed to define a state so tkHexValue is only returned while reading binary data. In the preamble part of the lexer I added

    %x BINARY
    

    and modify the following rules

    "{" {BEGIN BINARY; return yytext[0];}
    <BINARY>"}" {BEGIN INITIAL; return yytext[0];}
    <BINARY>[ \t\r\n] { /* ignore whitespace */ }
    

    And that was all!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm writing a parser for some LISP files. I'm trying to get rid of
I am writing a parser for csv-files, and sometimes I get NumberFormatException. Is there
I'm writing a C parser using PLY, and recently ran into a problem. This
Im writing a parser than can parse expressions like myfunc1() , myfunc2(param1) and myfunc3(param1,
How should I start writing a parser for BibTex files. As the initial design
I am writing a parser to parse incoming text files. I have it to
I am currently writing a parser for html file generated from doc files. The
I am writing a parser for quite complicated config files that make use of
I am writing a parser which operates on C/C++ source files. As 1st stage
I'm writing a parser for the Netscreen firewall configuration files to create some scripts

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.