Parser Generators - 编程开发

4.9 Parser Generators

This section shows how a parser generator can be used to facilitate the construction of the front end of a compiler. We shall use the LALR parser generator Yacc as the basis of our discussion, since it implements many of the concepts discussed in the previous two sections and it is widely available. Yacc stands for "yet another compiler-compiler," reflecting the popularity of parser generators in the early 1970s when the first version of Yacc was created by S. C. Johnson. Yacc is available as a command on the UNIX system, and has been used to help implement many production compilers.

4.9.1 The Parser Generator Yacc

A translator can be constructed using Yacc in the manner illustrated in Fig. 4.57. First, a file, say translate.y, containing a Yacc specification of the translator is prepared. The UNIX system command

yacc translate.y

transforms the file translate.y into a C program called y.tab.c using the LALR method outlined in Algorithm 4.63. The program y.tab.c is a representation of an LALR parser written in C, along with other C routines that the user may have prepared. The LALR parsing table is compacted as described in Section 4.7. By compiling y.tab.c along with the ly library that contains the LR parsing program using the command

cc y.tab.c –ly

we obtain the desired object program a.out that performs the translation specified by the original Yacc program. If other procedures are needed, they can be compiled or loaded with y.tab.c, just as with any C program.

Figure 4.57: Creating an input/output translator with Yacc

A Yacc source program has three parts:

declarations %% translation rules %% supporting C routines Example 4.69

To illustrate how to prepare a Yacc source program, let us construct a simple desk calculator that reads an arithmetic expression, evaluates it, and then prints its numeric value. We shall build the desk calculator starting with the with the following grammar for arithmetic expressions:

E -> E + T | T T -> T * F | F F -> ( E ) | digit

The token digit is a single digit between 0 and 9. A Yacc desk calculator program derived from this grammar is shown in Fig. 4.58.

Figure 4.58: Yacc specification of a simple desk calculator%{ #include <ctype.h> %} %token DIGIT%%line : expr ‘\n'{ printf("%d\n", $1); };expr : expr ‘+’ term { $$ = $1 + $3; }| term;term : term ‘*’ factor { $$ = $1 * $3; }| factor;factor : ‘(‘ expr ‘)’ { $$ = $2; }| DIGIT;%%yylex() { int c; c = getchar(); if(isdigit(c)) {yylval = c – ‘O’;return DIGIT; } return c; }The Declarations Part

There are two sections in the declarations part of a Yacc program; both are optional. In the first section, we put ordinary C declarations, delimited by %{ and %}. Here we place declarations of any temporaries used by the translation rules or procedures of the second and third sections. In Fig. 4.58, this section contains only the include-statement

#include<ctype.h>

that causes the C preprocessor to include the standard header file <ctype.h> that contains the predicate isdigit.

Also in the declarations part are declarations of grammar tokens. In Fig. 4.58, the statement

%token DIGIT

declares DIGIT to be a token. Tokens declared in this section can then be used in the second and third parts of the Yacc specification. If Lex is used to create the lexical analyzer that passes token to the Yacc parser, then these token declarations are also made available to the analyzer generated by Lex, as discussed in Section 3.5.2.

The Translation Rules Part

In the part of the Yacc specification after the first %% pair, we put the translation rules. Each rule consists of a grammar production and the associated semantic action. A set of productions that we have been writing:

<head> -> <body>1 | <body>2 | … | <body>n

would be written in Yacc as

<head> : <body>1 {<semantic action>1}| <body>2 {<semantic action>2}…| <body>n {<semantic action>n};

In a Yacc production, unquoted strings of letters and digits hot declared to be tokens are taken to be nontermirials. A quoted single character, e.g. ‘t’, is taken to be the terminal symbol c, as well as the integer code for the token represented by that character (i.e., Lex would return the character code for ‘c’ to the parser, as an integer). Alternative bodies can be separated by a vertical bar, and a semicolon follows each head with its alternatives and their semantic actions. The first head is taken to be the start symbol.

A Yacc semantic action is a sequence of C statements. In a semantic action, the symbol $$ refers to the attribute value associated with the nonterminal of the head, while $i refers to the value associated with the i-th grammar symbol (terminal or nonterminal) of the body. The semantic action is performed whenever we reduce by the associated production, so normally the semantic action computes a value for $$ in terms of the $i’s. In the Yact specification, we have written the two E-productions

E –> E + T | T

and their associated semantic actions as:

expr : expr ‘+’ term { $$ = $1 + $3; }| term;他们不计后果的彼此拥抱，握紧双手，怕天会亮，怕爱会走。

相关文章：

你感兴趣的文章：

标签云：