I’m trying to parse Bibtex files using lex/yacc. Strings in the bibtex database can

Question

0

Asked: May 22, 20262026-05-22T20:07:04+00:00 2026-05-22T20:07:04+00:00

I’m trying to parse Bibtex files using lex/yacc. Strings in the bibtex database can

0

I’m trying to parse Bibtex files using lex/yacc. Strings in the bibtex database can be surrounded by quotes “…” or with braces – {…}

But every entry is also enclosed in braces. How do differentiate between an entry and a string surrounded by braces?

@Book{sweig42,
  Author =   { Stefan Sweig },
  title =    { The impossible book },
  publisher =    { Dead Poet Society},
  year =     1942,
  month =        mar
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T20:07:05+00:00

you have various options:

lexer start conditions (from a Lex tutorial)

building on the ideas from greg ward, enhance your lex rules with start conditions (‘modes’ as they are called in the referenced source).

specifically, you would have the start conditions BASIC ENTRY STRING and the following rules (example taken and slightly enhanced from here):

%START BASIC ENTRY STRING
%%

/* Lexical grammar, mode 1: top-level */
<BASIC>AT           @ { BEGIN ENTRY; }
<BASIC>NEWLINE      \n
<BASIC>COMMENT      \%[^\n]*\n
<BASIC>WHITESPACE.  [\ \r\t]+
<BASIC>JUNK         [^@\n\ \r\t]+

/* Lexical grammar, mode 2: in-entry */
<ENTRY>NEWLINE      \n
<ENTRY>COMMENT      \%[^\n]*\n
<ENTRY>WHITESPACE   [\ \r\t]+
<ENTRY>NUMBER       [0-9]+
<ENTRY>NAME         [a-z0-9\!\$\&\*\+\-\.\/\:\;\<\>\?\[\]\^\_\`\|]+ { if (stricmp(yytext, "comment")==0) { BEGIN STRING; } }
<ENTRY>LBRACE       \{ { if (delim == '\0') { delim='}'; } else { blevel=1; BEGIN STRING; } }
<ENTRY>RBRACE       \} { BEGIN BASIC; }
<ENTRY>LPAREN       \( { BEGIN STRING; delim=')'; plevel=1; }
<ENTRY>RPAREN       \)
<ENTRY>EQUALS       =
<ENTRY>HASH         \#
<ENTRY>COMMA        ,
<ENTRY>QUOTE        \" { BEGIN STRING; bleveL=0; plevel=0; }

/* Lexical grammar, mode 3: strings */
<STRING>LBRACE       \{ { if (blevel>0) {blevel++;} }
<STRING>RBRACE       \} { if (blevel>0) { blevel--; if (blevel == 0) { BEGIN ENTRY; } } }
<STRING>LPAREN       \( { if (plevel>0) { plevel++;} }
<STRING>RPAREN       \} { if (plevel>0) { plevel--; if (plevel == 0) { BEGIN ENTRY; } } }
<STRING>QUOTE        \" { BEGIN ENTRY; }

please note that the rule set is by no means complete but should get you started. more details to be found here.

btparse

These docs explain in a fairly detailed fashion thenintricacies of parsing the bibtex formats and comes with a ‘python parser.
biblex

you might also be interested in employing the unix toolchain of biblex and bibparse. these tools generate and parse a bibtex token stream, respectively.

more info can be found here.

best regards, carsten

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse Bibtex files using lex/yacc. Strings in the bibtex database can

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply