Can someone explain to me what a context free grammar is? After looking at the Wikipedia entry and then the Wikipedia entry on formal grammar, I am left utterly and totally befuddled. Would someone be so kind as to explain what these things are?
I am wondering this because I wish to investigate parsing, and also on the side, the limitation of a regex engine.
I’m not sure if these terms are directly programming related, or if they are related more to linguistics in general. If that is the case, I apologize, perhaps this could be moved if so?
A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.
A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols… very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.
Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:
The way to interpret this is to say that
Scan be replaced byBBB, andBcan be replaced by 0, andBcan be replaced by 1. So to construct the string 010 we can doS -> BBB -> 0BB -> 01B -> 010.A context-free grammar is simply a grammar where the thing that you’re replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can’t appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like
Are not context free since they contain rules like
AB -> 1that have more than one non-terminal symbol on the left.