Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3300766
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T20:41:28+00:00 2026-05-17T20:41:28+00:00

Grammar by definition contains productions, example of very simple grammar: E -> E +

  • 0

Grammar by definition contains productions, example of very simple grammar:

E -> E + E
E -> n

I want to implement Grammar class in c#, but I’m not sure how to store productions, for example how to make difference between terminal and non-terminal symbol.
i was thinking about:

struct Production
{
   String Left;       // for example E
   String Right;      // for example +
}

Left will always be non-terminal symbol (it’s about context-free grammars)
But right side of production can contain terminal & non-terminal symbols

So now I’m thinkig about 2 ways of implementation:

  1. Non-terminal symbols will be written using brackets, for example:

    E+E will be represented as string “[E]+[E]”

  2. Create additional data structure NonTerminal

    struct NonTerminal
    {
    String Symbol;
    }

and E+E will be represented as array/list:

[new NonTerminal("E"), "+", new NonTerminal("E")]

but think that there are better ideas, it would be helpfull to hear some response

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T20:41:28+00:00Added an answer on May 17, 2026 at 8:41 pm

    I’d use

     Dictionary<NonTerminalSymbol,Set<List<Symbol>>> 
    

    enabling lookup by Nonterminal of the set of production rule right-hand-sides (themselves represented as lists of Terminal/Nonterminal Symbols) associated with the Nonterminal. (OP’s question shows that the Nonterminal E might be associated with two rules, but we only need the right-hand sides if we have the left hand side).

    This representation works only for a vanilla BNF grammar definitions, in which there is no syntactic sugar for common grammar-defining idioms. Such idioms typically include choice, Kleene star/plus, … and when they are avialable in defining the grammar you get an so-called Extended BNF or EBNF. If we write EBNF only allowing choice denoted by |, the Expression grammar in flat form hinted at by OP as an example is:

             E = S ;
             S = P | S + P | S - P ; 
             P = T | P * T | P / T ;
             T = T ** M | ( E ) | Number | ID ;
    

    and my first suggestion can represent this, because the alternation is only used to show different rule right-hand-sides. However, it won’t represent this:

             E = S ;
             S = P A* ;
             A = + P | - P ;
             P = T M+ ; -- to be different
             M = * T | / T ;
             T = T ** M | ( E ) | Number | ID | ID ( E  ( # | C) * ) ; -- function call with skipped parameters
             C = , E ;
    

    The key problem that this additional notation introduces is the ability to compose the WBNF operators repeatedly on sub-syntax definitions, and that’s the whole point of EBNF.

    To represent EBNF, you have to store productions essentially as trees that represent the, well, expression structure of the EBNF (in fact, this is essentially the same problem as representing any expression grammar).

    To represent the EBNF (expression) tree, you need to define the tree structure of the EBNF.
    You need tree nodes for:

    • symbols (terminal or not)
    • Alternation (having a list of alternatives)
    • Kleene *
    • Kleene +
    • “Optional” ?
    • others that you decide your EBNF has as operators (e.g., comma’d lists, a way to say that one has a list of grammar elements seperated by a chosen “comma” character, or ended by a chosen “semicolon” character, …)

    The easiest way to do that is to first write an EBNF grammar for the EBNF itself:

    EBNF = RULE+ ;
    RULE = LHS "=" TERM* ";" ;
    TERM = STRING | SYMBOL | TERM "*" 
           | TERM "+" | ';' STRING TERM | "," TERM STRING 
          "(" TERM* ")" ;
    

    Note that I’ve added comma’d and semicolon’ed list to the EBNF (extended, remember?)

    Now we can simply inspect the EBNF to decide what is needed.
    What you now need is a set of records (OK, classes for C#’er) to represent each of these rules.
    So:

    • a class for EBNF that contains a set of rules
    • a class for a RULE having an LHS symbol and a LIST
    • an abstract base class for TERM with several concrete variants, one for each alternative of TERM (a so-called “discriminated union” typically implemented by inheritance and instance_of checks in an OO language).

    Note that some of the concrete variants can refer to other class types in the representation, which is how you get a tree. For instance:

       KleeneStar inherits_from TERM {
            T: TERM:
       }
    

    Details left to the reader for encoding the rest.

    This raises an unstated problem for the OP: how do you use this grammmar representation to drive parsing of strings?

    The simple answer is get a parser generator, which means you need to figure out what EBNF it uses. (In this case, it might simply be easier to store your EBNF as text and hand it to that parser generator, which kind of makes this whole discussion moot).

    If you can’t get one (?), or want to build one of your own, well, now you have the representation you need to climb over to build it. One other alternative is to build a recursive descent parser driven by this representation to do your parsing. The approach to do that is too large to contain in the margin of this answer, but is straightforward for those with experience with recursion.

    EDIT 10/22: OP clarifies that he insists on parsing all context free grammars and “especially NL”. For all context free grammars, he will need very a stong parsing engine (Earley, GLR, full backtracking, …). For Natural Language, he will need parsers much stronger than those; people have been trying to build such parsers for decades with only some, but definitely not easy, success. Either of these two requirements seems to make the discussion of representing the grammar rather pointless; if he does represent a straight context free grammar, it won’t parse natural language (proven by those guys trying for decades), and if he wants a more powerful NL parser, he’ll need to simply use what the bleeding edge types have produced. Count me a pessimist on his probable success, unless he decides to become a real expert in the area of NL parsing.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to create a simple BaSH-like grammar on ANTLRv3 but haven't been able
i have shared the header file containing class definition of a Context Free grammar
The following simple calculator expression grammar (BNF) can be easily parsed with the a
I'm looking for something simple to use where the grammar is easy to define.
Is there a simple way to determine whether a grammar is LL(1), LR(0), SLR(1)...
I'm at my first experience with SableCC and grammar definition. I have the following
Motivated by the discussion The grammar for C++ classes is defined as class-key identifier
Is there any BNF grammar for regular expression?
I'm trying to build a grammar with the following: NUMERIC: INTEGER | FLOAT |
I'm trying to parse a grammar in ocamlyacc (pretty much the same as regular

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.