Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8772277
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T17:59:51+00:00 2026-06-13T17:59:51+00:00

I have a weird string syntax where the meaning of a delimiter depends on

  • 0

I have a weird string syntax where the meaning of a
delimiter depends on context. In the following sample
input:

( (foo) (bar) )

the result is a list of two strings ["foo"; "bar"].
The outer pair of parenthesis enters list mode.
Then, the next pair of parentheses delimits the string.
Inside strings, balanced pairs of parentheses are to be
treated as part of the string.

Right now the lexer decides what to return depending
on a global variable inside.

{
  open Sample_parser
  exception Error of string
  let inside = ref false (* <= to be eliminated *)
}

The delimiters are parentheses. If the lexer hits an
opening parenthesis, then

  • if inside is false, it emits an
    Enter token and inside is set to true.
  • If inside is true, it switches to a string lexer
    which treats any properly nested pair of parentheses
    as part of the string. If the nesting level returns to
    zero, the string buffer is passed to the parser.

If a closing parenthesis is encountered outside a string,
a Leave token is emitted and inside is unset.

My question is: How do I rewrite the lexer without
the global variable inside
?

Fwiw I use menhir but afaict the same would be true for
ocamlyacc.
(Sorry if this sounds confused, I’m really a newbie to
the yacc/lex approach.
I can express all the above without thinking as a PEG but I
haven’t got used to mentally keeping lexer and parser
separated.
Feel free to point out other issues with the code!)

Simple example: *sample_lexer.mll*

{
  open Sample_parser
  exception Error of string
  let inside = ref false (* <= to be eliminated *)
}

let lpar  = "("
let rpar  = ")"
let ws    = [' ' '\t' '\n' '\r']

rule tokenize = parse
  | ws    { tokenize lexbuf }
  | lpar  { if not !inside then begin
              inside := true;
              Enter
            end else begin
              let buf = Buffer.create 20 in
              String (string_scanner
                        (Lexing.lexeme_start lexbuf)
                        0
                        buf
                        lexbuf)
            end }
  | rpar  { inside := false; Leave }
and string_scanner init depth buf = parse
  | rpar  { if depth = 0 then begin
              Buffer.contents buf;
            end else begin
              Buffer.add_char buf ')';
              string_scanner init (depth - 1) buf lexbuf end }
  | lpar  { Buffer.add_char buf '(';
            string_scanner init (depth + 1) buf lexbuf }
  | eof   { raise (Error (Printf.sprintf
                           "Unexpected end of file inside string, pos %d--%d]!\n"
                           init
                           (Lexing.lexeme_start lexbuf))) }
  | _ as chr { Buffer.add_char buf chr;
               string_scanner init depth buf lexbuf }

*sample_scanner.mly*:

%token <string> String
%token Enter
%token Leave

%start <string list> process

%%

process:
  | Enter lst = string_list Leave { lst }

string_list:
  | elm = element lst = string_list { elm :: lst }
  | elm = element                   { [elm]      }

element:
  | str = String { str }

main.ml:

open Batteries

let sample_input = "( (foo (bar) baz) (xyzzy) )"
(*                  EibssssssssssssseibssssseiL
 * where E := enter inner
 *       L := leave inner
 *       i := ignore (whitespace)
 *       b := begin string
 *       e := end string
 *       s := part of string
 *
 * desired result: [ "foo (bar) baz"; "xyzzy" ] (type string list)
 *)

let main () =
  let buf = Lexing.from_string sample_input in
  try
    List.print
      String.print stdout
      (Sample_parser.process Sample_lexer.tokenize buf);
    print_string "\n";
  with
  | Sample_lexer.Error msg   -> Printf.eprintf "%s%!" msg
  | Sample_parser.Error      -> Printf.eprintf
                                    "Invalid syntax at pos %d.\n%!"
                                    (Lexing.lexeme_start buf)

let _ = main ()
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T17:59:52+00:00Added an answer on June 13, 2026 at 5:59 pm

    You can pass the state as an argument to tokenize. It still has to be mutable, but not global.

    rule tokenize inside = parse
      | ws    { tokenize inside lexbuf }
      | lpar  { if not !inside then begin
                  inside := true;
                  Enter
                end else begin
                  let buf = Buffer.create 20 in
                  String (string_scanner
                            (Lexing.lexeme_start lexbuf)
                            0
                            buf
                            lexbuf)
                end }
      | rpar  { inside := false; Leave }
    

    And you call the parser as follows:

    Sample_parser.process (Sample_lexer.tokenize (ref false)) buf
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a weird problem in converting a string to double in .NET 3.5.
Hello I have a problem with trimming string in c++. It adds some weird
Getting this weird LINQ error. title = System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Xml.Linq.XElement,System.String Here is the code I have:
I have a weird problem with my mysql query syntax. I have made a
I have a weird requirement on xml serialization. Refer the following C# code (which
I have this weird problem that a convert of a string on my machine
I have a weird problem with my sql script. I have a string $query
I have a weird variable that when alerted seams to be a string but
I have a weird problem... I use this code: String text = new String(values[0]);
I have a weird problem. I have checking a php string like this: On

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.