In our previous article, we studied in detail about top-down parser. It has two types recursive descent parser and predictive parser.
Top-down parsing is a fundamental technique used in compiler design to analyze the syntactic structure of programming languages. It follows a predictive parsing strategy, starting from the top (the start symbol) of the grammar and recursively expanding it to generate the input program. This article provides a comprehensive overview of top-down parsing, discussing its principles, advantages, and limitations.
Types of Top Down Parsers
There are two main
types of top-down parsers:
i.
Recursive
descent parsers
ii.
LL
parsers / Predictive parsers
Recursive Descent Parsing in Compiler Construction
Recursive descent parsers are a type of top down
parsers that use recursive procedures to implement each production of the
grammar. Each procedure corresponds to a nonterminal symbol and tries to match
or predict its right-hand side. If a procedure fails, it returns an error or
backtracks to its caller. Recursive descent parsers are easy to write by hand,
but they may not handle left recursion or backtracking efficiently.
What is Recursive Descent Parsing?
Recursive descent parsing is a method of syntax analysis
that starts from the start symbol and recursively applies production rules to
generate the input string. It uses a global variable called lookahead that
holds the current input symbol, and a procedure called match that consumes the
next input symbol and advances the lookahead. It also uses one procedure for
each non-terminal in the grammar, which calls other procedures or matches
according to the productions.
For example, consider the following grammar:
E -> TE’
E’ -> +TE’ | ε T -> FT’
T’ -> *FT’ | ε
F -> (E) | id
This grammar can be parsed by a recursive descent parser
using these procedures:
E() {
T();
E'();
}
E'() {
if (lookahead ==
'+') {
match('+');
T();
E'();
}
else {
// Epsilon
case, do nothing
}
}
T() {
F();
T'();
}
T'() {
if (lookahead ==
'*') {
match('*');
F();
T'();
}
else {
// Epsilon
case, do nothing
}
}
F() {
if (lookahead ==
'(') {
match('(');
E();
match(')');
}
else if (lookahead
== 'id') {
match('id');
}
else {
// Report
error, invalid input
}
}
match(expected) {
if (lookahead ==
expected) {
lookahead =
next_input_symbol();
}
else {
// Report
error, mismatched symbol
}
}
Using these procedures, we can parse the input string id +
id * id as follows:
E() -> T() -> F() -> match(‘id’) -> T’() ->
E’() -> match(‘+’) -> T() -> F() -> match(‘id’) -> T’() ->
match(‘*’) -> F() -> match(‘id’) -> T’() -> E’()
Advantages and Disadvantages of Recursive Descent Parsing
Recursive descent parsing has some advantages and
disadvantages compared to other parsing techniques. Some of them are:
Advantages:
i.
It is easy to implement and
understand.
ii.
It can handle any LL(1)
grammar, which means that it can decide which production to use by looking at
only one symbol ahead.
iii.
It can generate parse trees
in preorder, which is useful for some applications.
Disadvantages:
i.
It may require
backtracking, which is inefficient and may cause exponential time complexity
for some grammars.
ii.
It cannot handle left
recursion, which occurs when a non-terminal derives itself on its leftmost
symbol. This causes infinite loops in recursive descent parsing.
iii.
It may require modifying
the grammar to eliminate left recursion and left factoring, which may make the
grammar less readable and intuitive.
iv.
To implement a recursive
descent parser for a given grammar, we need to understand the concepts of First
and Follow. Let us first study about First and Follow sets, then we will proceed
to the implementation of the recursive descent parser.
First and Follow Sets:
In the realm of compiler construction, the concepts of
"First" and "Follow" play crucial roles in analyzing and
parsing the grammar of a programming language. These concepts aid in
determining the potential set of terminals that can appear as the first symbol
of production or the set of terminals that can follow a non-terminal symbol.
1. First Set:
The First set of a non-terminal symbol is the collection of
terminals that can be derived as the first symbol of any string derived from
that non-terminal. In other words, it represents the possible starting
terminals for a particular production rule. To construct the First set, we
analyze the productions of the grammar and consider various possibilities.
For example, let's consider the grammar production rules:
S -> aB
B -> bC | ε
C -> c
In this case, the First set for each non-terminal symbol is
as follows:
First(S) = {a}
First(B) = {b, ε}
First(C) = {c}
Here, First(S) contains 'a' because 'a' is the starting
terminal for the production rule S -> aB. First(B) includes 'b' because of
the production B -> bC, and it also includes ε (epsilon) because of the
production B -> ε. Finally, First(C) consists of 'c' since it is the only
terminal that appears as the first symbol in the production C -> c.
2. Follow Set:
The Follow set of a non-terminal symbol represents the set
of terminals that can appear immediately after occurrences of that non-terminal
symbol. It assists in determining the potential terminals that can follow a
non-terminal in a given grammar.
Continuing from the previous example, let's consider an
extended grammar:
S -> aB
B -> bC | ε
C -> cS
In this case, the Follow set for each non-terminal symbol is
as follows:
Follow(S) = {$}
Follow(B) = {c, $}
Follow(C) = {b, $}
Here, Follow(S) contains '$' because 'S' is the start symbol, and the end-of-input symbol ('$') can follow 'S'. Follow(B) includes 'c' as 'C' appears after 'B' in the grammar, and it also includes '$' because 'B' can derive ε, making it the final symbol. Follow(C) consists of 'b' as 'C' appears before 'B' in the grammar, and it also includes '$' as 'C' can be followed by the end-of-input symbol.
The First and Follow sets are fundamental for constructing
predictive parsing tables and ensuring correct parsing of the input based on
the grammar rules. These sets assist in determining the potential starting and
following terminals, aiding in the analysis and interpretation of programming
language grammar.
How to Implement a Recursive Descent Parser?
To implement the recursive descent parser, we need to follow
these steps:
i.
If there is any left
recursion and left factoring in the grammar remove it.
ii.
Next, calculate the FIRST
and FOLLOW sets for each non-terminal in the grammar.
iii.
Write one procedure for
each non-terminal in the grammar, which calls other procedures or matches
according to the productions. Use the FIRST and FOLLOW sets to decide which
production to use based on the lookahead symbol.
iv.
Write a procedure called
match that consumes the next input symbol and advances the lookahead. Check if
the lookahead matches the expected symbol, and report an error if not.
v.
Write a main procedure that
initializes the lookahead and calls the procedure for the start symbol. Check
if the input string is completely consumed and report an error if not.
Example of Implementation of Recursive Descent Parsing
Let us apply the steps of recursive descent parsing to the
following grammar:
S -> iEtS | iEtSeS | a E -> b
First, we eliminate left recursion and left factoring in the
grammar. Since there is none, we skip this step.
Second, we compute the FIRST and FOLLOW sets for each
non-terminal in the grammar. They are:
FIRST(S) = {i, a} FIRST(E) = {b}
FOLLOW(S) = {$, e} FOLLOW(E) = {t}
Third, we write one procedure for each non-terminal in the
grammar, using the FIRST and FOLLOW sets to decide which production to use
based on the lookahead symbol.
S() {
if (lookahead ==
'i') {
match('i');
E();
match('t');
S();
if (lookahead
== 'e') {
match('e');
S();
}
}
else if (lookahead
== 'a') {
match('a');
}
else {
// Report
error, invalid input
}
}
E() {
if (lookahead ==
'b') {
match('b');
}
else {
// Report
error, invalid input
}
}
match(expected) {
if (lookahead ==
expected) {
lookahead =
next_input_symbol();
}
else {
// Report
error, mismatched symbol
}
}
main() {
lookahead =
next_input_symbol();
S();
if (lookahead ==
'$') {
// Report
success, input string accepted
}
else {
// Report
error, extra input symbols
}
}
Using these procedures, we can parse the input string
ibtibteabt$ as follows:
main() -> S() -> match(‘i’) -> E() -> match(‘b’)
-> match(‘t’) -> S() -> match(‘i’) -> E() -> match(‘b’) ->
match(‘t’) -> S() -> match(‘e’) -> S() -> match(‘a’) -> S()
-> match(‘b’) -> match(‘t’) -> S()
Recursive descent parsing is a simple and intuitive
technique of syntax analysis that builds a parse tree from the root to the
leaves, following the grammar productions. It uses a set of recursive
procedures that correspond to the non-terminals of the grammar and a global
variable called lookahead that holds the current input symbol. However, it also
faces some problems such as backtracking, left recursion, and ambiguity that
limit its applicability and efficiency. To overcome these problems, we can use
predictive parsing, eliminate left recursion, and resolve ambiguity by
modifying the grammar.
LL Parsers or Predictive parser
LL parsers are a type of top-down parsers that use a
table-driven approach to implement each production of the grammar. The table
contains entries for each combination of a nonterminal symbol and an input
symbol, indicating which production to apply or whether to report an error.
Instead of using recursion or backtracking, LL parsers use
demands the grammar to be LL(k), which means that k symbols of lookahead are
sufficient to determine the next production. LL parsers are usually generated
by tools, but they are not able to handle ambiguous or complex grammars.
How to Construct an LL Parser?
To construct an LL parser for a given grammar, we need to
follow these steps:
i.
Eliminate left recursion
and left factoring in the grammar if any.
ii.
Compute the FIRST and
FOLLOW sets for each non-terminal in the grammar.
iii.
Construct the parsing table
using this rule:
iv.
For each production A ->
α, if a is in FIRST(α), then put A -> α in M[A, a]. If ε is in FIRST(α),
then for each terminal b in FOLLOW(A), put A -> α in M[A, b]. If ε is in
FIRST(α) and (end−of−inputsymbol)isinFOLLOW(A),thenputA−>αinM[A,].
v.
Write one procedure for
each non-terminal in the grammar, which calls other procedures or matches
according to the parsing table. Use the lookahead symbol to decide which
production to use for each non-terminal.
Example of LL
Parsing
Let us apply the steps of LL parsing to the following
grammar:
S -> iEtS | iEtSeS | a
E -> b
First, we eliminate left recursion and left factoring in the
grammar. Since there is none, we skip this step.
Second, we compute the FIRST and FOLLOW sets for each
non-terminal in the grammar. They are:
FIRST(S) = {i, a} FIRST(E) = {b}
FOLLOW(S) = {$, e} FOLLOW(E) = {t}
Third, we construct the parsing table using the rule. It is:
i |
a |
b |
t |
e |
$ |
|
S |
S->iEtS |
S->a |
||||
E |
E->b |
Here's a
breakdown of the LL parse table:
·
For non-terminal S:
a.
If the input symbol is 'i',
apply the production S -> iEtS.
b.
If the input symbol is 'a',
apply the production S -> a.
c.
Otherwise, leave the entry
empty.
·
For non-terminal E:
a.
If the input symbol is 'b',
apply the production E -> b.
b.
Otherwise, leave the entry
empty.
Fourth, we write one procedure for each non-terminal in the
grammar, using the lookahead symbol to decide which production to use for each
non-terminal.
S() {
if (lookahead ==
'i') {
match('i');
E();
match('t');
S();
if (lookahead
== 'e') {
match('e');
S();
}
}
else if (lookahead
== 'a') {
match('a');
}
else {
// Report
error: invalid input
}
}
E() {
if (lookahead ==
'b') {
match('b');
}
else {
// Report
error: invalid input
}
}
match(expected) {
if (lookahead ==
expected) {
lookahead =
next_input_symbol();
}
else {
// Report
error: mismatched symbol
}
}
main() {
lookahead =
next_input_symbol();
S();
if (lookahead ==
'$') {
// Report
success: input string accepted
}
else {
// Report
error: extra input symbols
}
}
Using these procedures, we can parse the input string
ibtibteabt$ as follows:
S() -> match(‘i’) -> E() -> match(‘b’) ->
match(‘t’) -> S() -> match(‘i’) -> E() -> match(‘b’) ->
match(‘t’) -> S() -> match(‘e’) -> S() -> match(‘a’) -> S()
-> match(‘b’) -> match(‘t’) -> S()
Recursive Descent Parser vs LL Parser
Aspect |
Recursive Descent Parser |
LL Parser |
Implementation |
Implemented
using recursive procedures |
Implemented
using a parsing table |
Lookahead |
Arbitrary
lookahead |
Fixed
lookahead (typically LL(1)) |
Grammar Restrictions |
Handles any
context-free grammar |
Handles a
subset of LL(k) grammars |
Backtracking |
May involve
backtracking |
Does not
involve backtracking |
Efficiency |
Potential for
inefficiency |
Generally
more efficient due to table lookup |
MCQs related to types of top-down parsers
1.
Which type of
top-down parser uses a separate predictive parsing table?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: b) LL(1) Parser
2.
Which type of
top-down parser is based on leftmost derivation?
a) LL(1) Parser
b) LL(k) Parser
c) Recursive Descent Parser
d) LL(*) Parser
Answer: c) Recursive Descent Parser
3.
Which type of
top-down parser allows arbitrary lookahead?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL() Parser
Answer: d) LL() Parser
4.
Which type of
top-down parser uses a fixed lookahead of one symbol?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: b) LL(1) Parser
5.
Which type of
top-down parser is most commonly used for LL(1) grammars?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: a) Recursive Descent Parser
6.
Which type of
top-down parser can handle non-LL(1) grammars with arbitrary lookahead?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL() Parser
Answer: d) LL() Parser
7.
Which type -down
parser is more flexible in terms of grammar restrictions?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: c) LL(k) Parser
8.
Which type of
top-down parser is capable of handling left recursion directly?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: a) Recursive Descent Parser
9.
Which type of
top-down parser may require backtracking in case of conflicts?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: a) Recursive Descent Parser
10.
Which type of top-down parser is more
efficient for LL(1) grammars with a limited lookahead?
a) Recursive Descent Parser
b) LL(1) Parser
c) LL(k) Parser
d) LL(*) Parser
Answer: b) LL(1) Parser
Short questions related to top-down parsing
Q: What is the primary characteristic of a top-down
parser?
A: Top-down parsers start with the start symbol and try to
derive the input string by repeatedly applying production rules in a
top-to-bottom manner.
Q: What is the purpose of the predictive parsing table in
top-down parsing?
A: The predictive parsing table is used to determine the
production rule to apply based on the current non-terminal and lookahead
symbol.
Q: What is the difference between LL(1) and LL(k)
parsers?
A: LL(1) parsers have a fixed lookahead of one symbol, whereas
LL(k) parsers can have a variable lookahead of k symbols.
Q: What is left recursion, and how does it impact
top-down parsing?
A: Left recursion occurs when a non-terminal directly or
indirectly produces itself as the leftmost symbol in its production rule,
making it difficult for top-down parsers to handle.
Q: What is the purpose of the follow set in top-down
parsing?
A: The follow set helps determine the potential terminals
that can follow a non-terminal, aiding in error detection and recovery during
parsing.
Q: What is the advantage of using LL() parsers over LL(1)
parsers?
A: LL() parsers can handle non-LL(1) grammars by allowing
arbitrary lookahead, making them more powerful but potentially less efficient.
Q: What are the two main types of conflicts that can
arise in top-down parsing?
A: The two main types of conflicts are first/follow
conflicts and follow/follow conflicts, which occur when the predictive parsing
table has multiple entries for the same non-terminal and lookahead symbol
combination.
Q: What is backtracking in the context of top-down
parsing?
A: Backtracking refers to the process of undoing the choices
made during parsing and trying alternative paths when conflicts or errors are
encountered.
Q: How can left recursion be eliminated in grammar to
make it suitable for top-down parsing?
A: Left recursion can be eliminated by transforming the
production rules to use right recursion or by applying left-factoring
techniques.
Q: Can a top-down parser handle all types of grammars?
A: No, top-down parsers can handle only LL(k) grammars,
where k represents the maximum lookahead required by the parser.
In this article, we have discussed recursive descent parsers and predictive parsers which are types of top-down parsers. We have also compared these two types. Hope this article is useful for you. We love to hear from you. Happy learning :).