Ambiguous Grammar
A grammar is ambiguous if it can derive a sentence with two different parse trees. For example, using the grammar below we can derive two different parse tree with the sentence of 1+2*3.
Ambiguous grammars are problematic for compiling since it can derive different meanings. Actually, we can usually eliminate ambiguity by transforming the grammar as shown below.
Predictive Parsing
Some grammar are easy to use a simple algorithm as recursive-descent.
A recursive-descent parser for this language has one function for each non-
terminal and one clause for each production. We illustrate this by writing a recursive-descent parser for grammar below.
enum token {IF, THEN, ELSE, BEGIN, END, PRINT, SEMI, NUM, EQ};
extern enum token getToken(void);
enum token tok;
void advance() {tok=getToken();}
void eat(enum token t) {if (tok==t) advance(); else error();}
void S(void) {switch(tok) {
case IF: eat(IF); E(); eat(THEN); S();
eat(ELSE); S(); break;
case BEGIN: eat(BEGIN); S(); L(); break;
case PRINT: eat(PRINT); E(); break;
default: error();
}}
void L(void) {switch(tok) {
case END: eat(END); break;
case SEMI: eat(SEMI); S(); L(); break;
default: error();
}}
void E(void) { eat(NUM); eat(EQ); eat(NUM); }
However, this method will fail if using the umbiguous grammar above.
void S(void) { E(); eat(EOF); }
void E(void) {switch (tok) {
case ?: E(); eat(PLUS); T(); break;
case ?: E(); eat(MINUS); T(); break;
case ?: T(); break;
default: error();
}}
void T(void) {switch (tok) {
case ?: T(); eat(TIMES); F(); break;
case ?: T(); eat(DIV); F(); break;
case ?: F(); break;
default: error();
}}
There is a conflict here: the E function has no way to know which production to use. That means the recursive-descent parsing works only on grammars where the first terminal symbol of each subexpression provides enough information to choose which production to use. So how can we use predictive parsing working on more complicated grammar?
This is how we do to parse the grammar below.
1. Calculate the First and Follow sets of all terminals in grammar.
FIRST(r) is the set of all terminals that can begin any string derived from r. For example, let r= T*F, thus FIRST(r) = { id , num , ( }.
Notice: If two different production X->r1 and X->r2 have the same left-hand-side symbol(X) and their right hand side have overlapping FIRST sets, then the grammar cannot be parsed using the predictive parsing. if FIRST(r1)=FIRST(r2)=I, then the X function in parser will not know what to do if the input token = I.
FOLLOW(X) is the set of terminals that can immediately follow X.
Then use the algorithm below to calculate.
for each terminal symbol Z
FIRST[Z]←{Z}
repeat
for each production X → Y1Y2 ··· Yk
for each i from 1 to k, each j from i + 1 to k,
if all the Yi are nullable
then nullable[X] ← true
if Y1 ··· Yi−1 are all nullable
then FIRST[X] ← FIRST[X] ∪ FIRST[Yi]
if Yi+1 ··· Yk are all nullable
then FOLLOW[Yi] ← FOLLOW[Yi] ∪ FOLLOW[X]
if Yi+1 ··· Yj−1 are all nullable
then FOLLOW[Yi] ← FOLLOW[Yi] ∪ FIRST[Yj]
until FIRST, FOLLOW, and nullable did not change in this iteration.
2. Build the Parsing table.
To construct this table, enter production X → γ in row X, column T of the table for each T ∈ FIRST(γ ). Also, if γ is nullable, enter the production
in row X, column T for each T ∈ FOLLOW[X].
The presence of duplicate entries means that predictive parsing will not work.
Also this is grammar is ambiguous and ambiguous grammar will always lead to duplicate entries in the predictive parsing table.
3. Use recursive-decent parser to implement it based on the production for each (X,T) in parsing table.
LL(1)
What we talk about above actually is the LL(1), standing for Left-to-right parse, Leftmost-derivation,1-symbol lookahead.
We can generalize the notion of FIRST sets to describe the first k tokens of a string, and to make an LL(k) parsing table whose rows columns are every sequence of k terminals. This is not efficient since the table are so large.
Eliminating Left Recursion
Suppose we build a predictive parser for that ambiguous grammar (Grammar 1)
These two productions are certain to cause duplicate entries in the LL(1) parsing table, since any token in FIRST(T ) will also be in FIRST(E + T ). The problem is that E appears as the first right-hand-side symbol in an E-production; this is called left recursion. Grammars with left recursion cannot be LL(1).
To eliminate left recursion, we will rewrite using right recursion. We introduce
a new nonterminal E', and write
This derives the same set of strings (on T and +) as the original two productions, but now there is no left recursion. So we can reconstruct the grammar and build the LL(1) parser.
Left Factoring
We have solved Left Recursive problem, but there is one another problem when two production for the same nonterminal start with the same symbols.Like
S → if E then S else S
S → if E then S
We can solve that by left factoring the grammar, like below.
S → if E then S X
X →
X → else S