Recursive Descent Parser and Predictive Parser

In our previous article, we studied in detail about top-down parser. It has two types recursive descent parser and predictive parser.

Top-down parsing is a fundamental technique used in compiler design to analyze the syntactic structure of programming languages. It follows a predictive parsing strategy, starting from the top (the start symbol) of the grammar and recursively expanding it to generate the input program. This article provides a comprehensive overview of top-down parsing, discussing its principles, advantages, and limitations.

Types of Top Down Parsers

There are two main types of top-down parsers:

i. Recursive descent parsers

ii. LL parsers / Predictive parsers

Recursive Descent Parsing in Compiler Construction

Recursive descent parsers are a type of top down parsers that use recursive procedures to implement each production of the grammar. Each procedure corresponds to a nonterminal symbol and tries to match or predict its right-hand side. If a procedure fails, it returns an error or backtracks to its caller. Recursive descent parsers are easy to write by hand, but they may not handle left recursion or backtracking efficiently.

What is Recursive Descent Parsing?

Recursive descent parsing is a method of syntax analysis that starts from the start symbol and recursively applies production rules to generate the input string. It uses a global variable called lookahead that holds the current input symbol, and a procedure called match that consumes the next input symbol and advances the lookahead. It also uses one procedure for each non-terminal in the grammar, which calls other procedures or matches according to the productions.

For example, consider the following grammar:

E -> TE’

E’ -> +TE’ | ε T -> FT’

T’ -> *FT’ | ε

F -> (E) | id

This grammar can be parsed by a recursive descent parser using these procedures:

E() {

T();

E'();

}

E'() {

if (lookahead == '+') {

match('+');

T();

E'();

}

else {

// Epsilon case, do nothing

}

T() {

F();

T'();

}

T'() {

if (lookahead == '*') {

match('*');

F();

T'();

}

else {

// Epsilon case, do nothing

}

F() {

if (lookahead == '(') {

match('(');

E();

match(')');

}

else if (lookahead == 'id') {

match('id');

}

else {

// Report error, invalid input

}

match(expected) {

if (lookahead == expected) {

lookahead = next_input_symbol();

}

else {

// Report error, mismatched symbol

}

Using these procedures, we can parse the input string id + id * id as follows:

E() -> T() -> F() -> match(‘id’) -> T’() -> E’() -> match(‘+’) -> T() -> F() -> match(‘id’) -> T’() -> match(‘*’) -> F() -> match(‘id’) -> T’() -> E’()

Advantages and Disadvantages of Recursive Descent Parsing

Recursive descent parsing has some advantages and disadvantages compared to other parsing techniques. Some of them are:

Advantages:

i. It is easy to implement and understand.

ii. It can handle any LL(1) grammar, which means that it can decide which production to use by looking at only one symbol ahead.

iii. It can generate parse trees in preorder, which is useful for some applications.

Disadvantages:

i. It may require backtracking, which is inefficient and may cause exponential time complexity for some grammars.

ii. It cannot handle left recursion, which occurs when a non-terminal derives itself on its leftmost symbol. This causes infinite loops in recursive descent parsing.

iii. It may require modifying the grammar to eliminate left recursion and left factoring, which may make the grammar less readable and intuitive.

iv. To implement a recursive descent parser for a given grammar, we need to understand the concepts of First and Follow. Let us first study about First and Follow sets, then we will proceed to the implementation of the recursive descent parser.

First and Follow Sets:

In the realm of compiler construction, the concepts of "First" and "Follow" play crucial roles in analyzing and parsing the grammar of a programming language. These concepts aid in determining the potential set of terminals that can appear as the first symbol of production or the set of terminals that can follow a non-terminal symbol.

1. First Set:

The First set of a non-terminal symbol is the collection of terminals that can be derived as the first symbol of any string derived from that non-terminal. In other words, it represents the possible starting terminals for a particular production rule. To construct the First set, we analyze the productions of the grammar and consider various possibilities.

For example, let's consider the grammar production rules:

S -> aB

B -> bC | ε

C -> c

In this case, the First set for each non-terminal symbol is as follows:

First(S) = {a}

First(B) = {b, ε}

First(C) = {c}

Here, First(S) contains 'a' because 'a' is the starting terminal for the production rule S -> aB. First(B) includes 'b' because of the production B -> bC, and it also includes ε (epsilon) because of the production B -> ε. Finally, First(C) consists of 'c' since it is the only terminal that appears as the first symbol in the production C -> c.

2. Follow Set:

The Follow set of a non-terminal symbol represents the set of terminals that can appear immediately after occurrences of that non-terminal symbol. It assists in determining the potential terminals that can follow a non-terminal in a given grammar.

Continuing from the previous example, let's consider an extended grammar:

S -> aB

B -> bC | ε

C -> cS

In this case, the Follow set for each non-terminal symbol is as follows:

Follow(S) = {$}

Follow(B) = {c, $}

Follow(C) = {b, $}

Here, Follow(S) contains '$' because 'S' is the start symbol, and the end-of-input symbol ('$') can follow 'S'. Follow(B) includes 'c' as 'C' appears after 'B' in the grammar, and it also includes '$' because 'B' can derive ε, making it the final symbol. Follow(C) consists of 'b' as 'C' appears before 'B' in the grammar, and it also includes '$' as 'C' can be followed by the end-of-input symbol.

The First and Follow sets are fundamental for constructing predictive parsing tables and ensuring correct parsing of the input based on the grammar rules. These sets assist in determining the potential starting and following terminals, aiding in the analysis and interpretation of programming language grammar.

How to Implement a Recursive Descent Parser?

To implement the recursive descent parser, we need to follow these steps:

i. If there is any left recursion and left factoring in the grammar remove it.

ii. Next, calculate the FIRST and FOLLOW sets for each non-terminal in the grammar.

iii. Write one procedure for each non-terminal in the grammar, which calls other procedures or matches according to the productions. Use the FIRST and FOLLOW sets to decide which production to use based on the lookahead symbol.

iv. Write a procedure called match that consumes the next input symbol and advances the lookahead. Check if the lookahead matches the expected symbol, and report an error if not.

v. Write a main procedure that initializes the lookahead and calls the procedure for the start symbol. Check if the input string is completely consumed and report an error if not.

Example of Implementation of Recursive Descent Parsing

Let us apply the steps of recursive descent parsing to the following grammar:

S -> iEtS | iEtSeS | a E -> b

First, we eliminate left recursion and left factoring in the grammar. Since there is none, we skip this step.

Second, we compute the FIRST and FOLLOW sets for each non-terminal in the grammar. They are:

FIRST(S) = {i, a} FIRST(E) = {b}

FOLLOW(S) = {$, e} FOLLOW(E) = {t}

Third, we write one procedure for each non-terminal in the grammar, using the FIRST and FOLLOW sets to decide which production to use based on the lookahead symbol.

S() {

if (lookahead == 'i') {

match('i');

E();

match('t');

S();

if (lookahead == 'e') {

match('e');

S();

}

else if (lookahead == 'a') {

match('a');

}

else {

// Report error, invalid input

}

E() {

if (lookahead == 'b') {

match('b');

}

else {

// Report error, invalid input

}

match(expected) {

if (lookahead == expected) {

lookahead = next_input_symbol();

}

else {

// Report error, mismatched symbol

}

main() {

lookahead = next_input_symbol();

S();

if (lookahead == '$') {

// Report success, input string accepted

}

else {

// Report error, extra input symbols

}

Using these procedures, we can parse the input string ibtibteabt$ as follows:

main() -> S() -> match(‘i’) -> E() -> match(‘b’) -> match(‘t’) -> S() -> match(‘i’) -> E() -> match(‘b’) -> match(‘t’) -> S() -> match(‘e’) -> S() -> match(‘a’) -> S() -> match(‘b’) -> match(‘t’) -> S()

Recursive descent parsing is a simple and intuitive technique of syntax analysis that builds a parse tree from the root to the leaves, following the grammar productions. It uses a set of recursive procedures that correspond to the non-terminals of the grammar and a global variable called lookahead that holds the current input symbol. However, it also faces some problems such as backtracking, left recursion, and ambiguity that limit its applicability and efficiency. To overcome these problems, we can use predictive parsing, eliminate left recursion, and resolve ambiguity by modifying the grammar.

LL Parsers or Predictive parser

LL parsers are a type of top-down parsers that use a table-driven approach to implement each production of the grammar. The table contains entries for each combination of a nonterminal symbol and an input symbol, indicating which production to apply or whether to report an error.

Instead of using recursion or backtracking, LL parsers use demands the grammar to be LL(k), which means that k symbols of lookahead are sufficient to determine the next production. LL parsers are usually generated by tools, but they are not able to handle ambiguous or complex grammars.

How to Construct an LL Parser?

To construct an LL parser for a given grammar, we need to follow these steps:

i. Eliminate left recursion and left factoring in the grammar if any.

ii. Compute the FIRST and FOLLOW sets for each non-terminal in the grammar.

iii. Construct the parsing table using this rule:

iv. For each production A -> α, if a is in FIRST(α), then put A -> α in M[A, a]. If ε is in FIRST(α), then for each terminal b in FOLLOW(A), put A -> α in M[A, b]. If ε is in FIRST(α) and (end−of−inputsymbol)isinFOLLOW(A),thenputA−>αinM[A,].

v. Write one procedure for each non-terminal in the grammar, which calls other procedures or matches according to the parsing table. Use the lookahead symbol to decide which production to use for each non-terminal.

Example of LL Parsing

Let us apply the steps of LL parsing to the following grammar:

S -> iEtS | iEtSeS | a

E -> b

First, we eliminate left recursion and left factoring in the grammar. Since there is none, we skip this step.

Second, we compute the FIRST and FOLLOW sets for each non-terminal in the grammar. They are:

FIRST(S) = {i, a} FIRST(E) = {b}

FOLLOW(S) = {$, e} FOLLOW(E) = {t}

Third, we construct the parsing table using the rule. It is:

	i	a	b	t	e	$
S	S->iEtS	S->a
E			E->b

Here's a breakdown of the LL parse table:

· For non-terminal S:

a. If the input symbol is 'i', apply the production S -> iEtS.

b. If the input symbol is 'a', apply the production S -> a.

c. Otherwise, leave the entry empty.

· For non-terminal E:

a. If the input symbol is 'b', apply the production E -> b.

b. Otherwise, leave the entry empty.

Fourth, we write one procedure for each non-terminal in the grammar, using the lookahead symbol to decide which production to use for each non-terminal.

S() {

if (lookahead == 'i') {

match('i');

E();

match('t');

S();

if (lookahead == 'e') {

match('e');

S();

}

else if (lookahead == 'a') {

match('a');

}

else {

// Report error: invalid input

}

E() {

if (lookahead == 'b') {

match('b');

}

else {

// Report error: invalid input

}

match(expected) {

if (lookahead == expected) {

lookahead = next_input_symbol();

}

else {

// Report error: mismatched symbol

}

main() {

lookahead = next_input_symbol();

S();

if (lookahead == '$') {

// Report success: input string accepted

}

else {

// Report error: extra input symbols

}

Using these procedures, we can parse the input string ibtibteabt$ as follows:

S() -> match(‘i’) -> E() -> match(‘b’) -> match(‘t’) -> S() -> match(‘i’) -> E() -> match(‘b’) -> match(‘t’) -> S() -> match(‘e’) -> S() -> match(‘a’) -> S() -> match(‘b’) -> match(‘t’) -> S()

Recursive Descent Parser vs LL Parser

Aspect	Recursive Descent Parser	LL Parser
Implementation	Implemented using recursive procedures	Implemented using a parsing table
Lookahead	Arbitrary lookahead	Fixed lookahead (typically LL(1))
Grammar Restrictions	Handles any context-free grammar	Handles a subset of LL(k) grammars
Backtracking	May involve backtracking	Does not involve backtracking
Efficiency	Potential for inefficiency	Generally more efficient due to table lookup

MCQs related to types of top-down parsers

1. Which type of top-down parser uses a separate predictive parsing table?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: b) LL(1) Parser

2. Which type of top-down parser is based on leftmost derivation?

a) LL(1) Parser

b) LL(k) Parser

c) Recursive Descent Parser

d) LL(*) Parser

Answer: c) Recursive Descent Parser

3. Which type of top-down parser allows arbitrary lookahead?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL() Parser

Answer: d) LL() Parser

4. Which type of top-down parser uses a fixed lookahead of one symbol?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: b) LL(1) Parser

5. Which type of top-down parser is most commonly used for LL(1) grammars?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: a) Recursive Descent Parser

6. Which type of top-down parser can handle non-LL(1) grammars with arbitrary lookahead?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL() Parser

Answer: d) LL() Parser

7. Which type -down parser is more flexible in terms of grammar restrictions?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: c) LL(k) Parser

8. Which type of top-down parser is capable of handling left recursion directly?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: a) Recursive Descent Parser

9. Which type of top-down parser may require backtracking in case of conflicts?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: a) Recursive Descent Parser

10. Which type of top-down parser is more efficient for LL(1) grammars with a limited lookahead?

a) Recursive Descent Parser

b) LL(1) Parser

c) LL(k) Parser

d) LL(*) Parser

Answer: b) LL(1) Parser

Short questions related to top-down parsing

Q: What is the primary characteristic of a top-down parser?

A: Top-down parsers start with the start symbol and try to derive the input string by repeatedly applying production rules in a top-to-bottom manner.

Q: What is the purpose of the predictive parsing table in top-down parsing?

A: The predictive parsing table is used to determine the production rule to apply based on the current non-terminal and lookahead symbol.

Q: What is the difference between LL(1) and LL(k) parsers?

A: LL(1) parsers have a fixed lookahead of one symbol, whereas LL(k) parsers can have a variable lookahead of k symbols.

Q: What is left recursion, and how does it impact top-down parsing?

A: Left recursion occurs when a non-terminal directly or indirectly produces itself as the leftmost symbol in its production rule, making it difficult for top-down parsers to handle.

Q: What is the purpose of the follow set in top-down parsing?

A: The follow set helps determine the potential terminals that can follow a non-terminal, aiding in error detection and recovery during parsing.

Q: What is the advantage of using LL() parsers over LL(1) parsers?

A: LL() parsers can handle non-LL(1) grammars by allowing arbitrary lookahead, making them more powerful but potentially less efficient.

Q: What are the two main types of conflicts that can arise in top-down parsing?

A: The two main types of conflicts are first/follow conflicts and follow/follow conflicts, which occur when the predictive parsing table has multiple entries for the same non-terminal and lookahead symbol combination.

Q: What is backtracking in the context of top-down parsing?

A: Backtracking refers to the process of undoing the choices made during parsing and trying alternative paths when conflicts or errors are encountered.

Q: How can left recursion be eliminated in grammar to make it suitable for top-down parsing?

A: Left recursion can be eliminated by transforming the production rules to use right recursion or by applying left-factoring techniques.

Q: Can a top-down parser handle all types of grammars?

A: No, top-down parsers can handle only LL(k) grammars, where k represents the maximum lookahead required by the parser.

In this article, we have discussed recursive descent parsers and predictive parsers which are types of top-down parsers. We have also compared these two types. Hope this article is useful for you. We love to hear from you. Happy learning :).

Recursive Descent Parser and Predictive Parser | Types of Top Down Parser

Types of Top Down Parsers

Recursive Descent Parsing in Compiler Construction

What is Recursive Descent Parsing?

Advantages and Disadvantages of Recursive Descent Parsing

First and Follow Sets:

1. First Set:

2. Follow Set:

How to Implement a Recursive Descent Parser?

Example of Implementation of Recursive Descent Parsing

LL Parsers or Predictive parser

How to Construct an LL Parser?

Example of LL Parsing

Recursive Descent Parser vs LL Parser

MCQs related to types of top-down parsers

Short questions related to top-down parsing

Post a Comment

Social Plugin

Popular Posts