We have already studied about the introduction to compiler construction. We know, through different “phases of compiler”, a compiler converts human-readable code into machine-executable instructions. Familiarizing yourself with these compiler phases is crucial for developers and programmers to optimize their code and ensure efficient execution. This article provides an in-depth exploration of phases of compiler, highlighting their significance and contributions to the overall compilation process. Below image describes all the phases of compilation.
Table of Contents
- Lexical
Analysis
- Syntax
Analysis
- Semantic
Analysis
- Intermediate
Code Generation
- Code
Optimization
- Code
Generation
- Symbol Table Management
Lexical Analysis
Lexical analysis is the first phase of compilation
process. It takes source code as input and breaks it into a stream of tokens,
which are the basic units of the programming language. The tokens are then
passed on to the next phase for further processing.
The main functions of lexical analysis are:
- Identify the
lexical units in a source code
- Classify
lexical units into classes like constants, reserved words, identifiers,
operators, etc. and enter them in different tables.
- Ignore
comments, whitespaces, and other irrelevant characters in the source code
- Identify tokens
that are not part of the language and report errors
For example, consider the following C program:
// This is a comment
The lexical analyzer left to right, character by
character, and group them into tokens as follows:
- // : Comment
start token
- This is a
comment : Comment token
- \n : Newline
token
- int : Keyword
token
- x :
Identifier token
- = :
Assignment operator token
- 10 : Integer
constant token
- ; : Semicolon token
Example: x = y + 100 generate token
<x, id1><=><y, id2><+><100>
id1=id2+100
Name |
Type |
Value |
id1 |
int |
100 |
id2 |
int |
- |
printf |
function |
- |
Syntax Analysis
Syntax analysis is the second phase of compilation
process. It takes the stream of tokens generated by the lexical analysis phase
and checks whether they conform to the grammar of the programming language. The
output of this phase is usually an Abstract Syntax Tree (AST), which is a
hierarchical representation of the syntactic structure of the source code.
The main functions of syntax analysis are:
- Obtain tokens
from the lexical analyzer
- Check if the
expression is syntactically correct or not
- Report all
syntax errors
- Construct an
AST from the tokens
For example, consider the following expression:
x = y + 100
<x, id1><=><y, id2><+><100>
id1= id2 + 100
This step generate following parse tree
Semantic Analysis
Semantic analysis is the third phase of compilation
process. It checks whether the code is semantically correct, i.e., whether it
conforms to the language’s type system and other semantic rules. It also
performs type checking, type conversion, scope resolution, etc.
The main functions of semantic analysis are:
- Obtain AST from
the syntax analyzer
- Check if the
expression is semantically correct or not
- Report all
semantic errors
- Annotate AST
with type information
For example, consider the following expression:
x = y + 100
id1 = id2 + 100.0
if we want to change data type then we will do type
checking.
Intermediate Code
Generation
Intermediate code generation is the fourth in phase of
compiler. It generates an intermediate representation of the source
code that can be easily translated into machine code. The intermediate code can
be in various forms such as three-address code, quadruples, triples, etc.
The main functions of intermediate code generation are:
- Obtain AST with
types from the semantic analyzer
- Generate
intermediate code from the AST
- Optimize
intermediate code for better performance
For example, consider the following expression:
id1 = id2 + 100.0
The intermediate code generator will generate an intermediate
code from the AST with types as follows:
t1=int to float (100)
t2 = id2 + t1
id1 = t2
The intermediate
code shows how the expression is evaluated using temporary variables.
Code Optimization
Code optimization ranks fifth in phases of compiler. It applies various optimization techniques to the intermediate code to
improve the performance of the generated machine code. The optimization can be
done at various levels such as local, global, loop, etc.
The main functions of code optimization are:
- Obtain intermediate
code from the intermediate code generator
- Apply
optimization techniques to the intermediate code
- Generate
optimized intermediate code
For example, consider the following intermediate code:
id1 = id2 + 100
t1 = id2 + 100.0
id1 = t1
The optimized intermediate code shows how the
expression is simplified and reduced by eliminating unnecessary computations
and variables.
Code Generation
Code generation stays final in phases of compiler. It takes the optimized intermediate code and generates the actual
machine code that can be executed by the target hardware. The machine code can
be in various forms such as binary, assembly, object, etc.
The main functions of code generation are:
- Obtain
optimized intermediate code from the code optimizer
- Generate
machine code from the optimized intermediate code
- Allocate
registers and memory for variables and instructions
From our previous example, intermediate code, following
target code will be generated:
ADDI R3, R2, 100
MOV R1, R3
Example:
Apply all compiler phases on below given source code:
x = y + 100
Symbol Table Management
Symbol table is a data structure that stores
information about the tokens such as their names, types, values, scopes, etc.
It helps the compiler to function smoothly by finding the identifiers quickly.
The symbol table is created and updated by various phases of compiler such as
lexical analysis, semantic analysis, etc.
The main functions of symbol table management are:
- Create and
maintain symbol table for each scope
- Insert and
retrieve information about tokens from symbol table
- Resolve name
conflicts and scope issues
For example, consider the following C program:
int x; // Global variable x
void foo(int y) // Function foo with parameter y
{
int z; // Local variable z
x = y + z; // Assign y + z to x
}
void bar(int x) // Function bar with parameter x (different from global x)
{
int y; // Local variable y
y = x * 2; // Assign x * 2 to y
}
The symbol table management will create and maintain a
symbol table for each scope as follows:
The symbol table for the above code is as follows:
Scope |
Name |
Type |
Value |
Global |
x |
int |
- |
foo |
y |
int |
- |
foo |
z |
int |
- |
bar |
x |
int |
- |
bar |
y |
int |
- |
The scope column indicates the visibility of the name.
The value column indicates the initial or assigned value of the name. The dash
(-) means that the value is unknown or not assigned.
Error Handling
Error handling is an important aspect of compilation
process. It deals with detecting and reporting errors that occur during various
phases of compiler. The errors can be classified into two types: syntactic
errors and semantic errors.
Syntactic errors are the errors that violate the rules
of grammar or syntax of the programming language. For example, missing a semicolon,
mismatching parentheses, using an undefined identifier, etc. Syntactic errors
are detected and reported by the lexical analysis and syntax analysis phases.
Semantic errors are the errors that violate the rules
of meaning or semantics of the programming language. For example, assigning a
string to an integer variable, dividing by zero, using an out-of-scope
variable, etc. Semantic errors are detected and reported by the semantic
analysis and intermediate code generation phases.
The main functions of error handling are:
- Detect and report errors during various phases of compiler
- Recover from errors and continue the compilation process
- Provide meaningful and helpful error messages to the user
For example, consider the following C program:
int x = 10;
int y = 0;
int z = x / y; // Semantic error: division by zero
printf("%d\n", z);
The error handling will detect and report the semantic
error as follows:
Error: Division by zero at line 3
The error message will indicate the type, location, and
cause of the error.
Short Questions on phases of compiler:
1. What are the main phases of a compiler?
The main phases of a compiler
include lexical analysis, syntax analysis, semantic analysis, intermediate code
generation, code optimization, code generation, symbol table management, in
compiler technology.
2. Why is code optimization important in the compilation
process?
Code optimization improves the
performance and efficiency of the compiled code by reducing execution time and
resource usage. It ensures that the generated code runs as efficiently as
possible.
3. What is the role of the linker in the compilation process?
The linker resolves references to
external libraries and combines object code from multiple source files into a
single executable program. It ensures that all dependencies are resolved and
prepares the final program for execution.
4. How does the compiler handle error during the compilation
process?
The compiler detects and reports
various types of errors, such as lexical, syntax, and semantic errors. It
provides error messages that help identify and fix issues in the code.
5. What are some future trends in compiler technology?
Future trends in compiler
technology include the development of domain-specific languages and compilers,
the integration of machine learning for code optimization, and exploring
parallelism and concurrency for enhanced performance.
Understanding the phases of a compiler is crucial for
developers and programmers. Each phase plays a significant role in transforming
source code into efficient and executable programs. By grasping the intricacies
of these phases, developers can optimize their code, improve performance, and
ensure compatibility with different target architectures. Compiler construction
is an exciting field that continues to shape the way we write and execute code.