实验细节与要求¶

词法分析器¶

现在我们开始编写 Cminusf 的词法解析器。首先，让我们了解一下 Cminusf 的词法规则，随后将介绍实验要求和细节。

Cminusf 词法¶

Cminus 是 C 语言的一个子集，该语言的语法在《编译原理与实践》第九章附录中有详细的介绍。而 Cminusf 则是在 Cminus 上追加了浮点操作。

关键字
```
else if int return void while float
```

专用符号

+ - * / < <= > >= == != = ; , ( ) [ ] { } /* */

标识符 ID 和数值，通过下列正则表达式定义：

letter = a|...|z|A|...|Z
digit = 0|...|9
ID = letter+
INTEGER = digit+
FLOAT = (digit+. | digit*.digit+)

注释用 /*...*/ 表示，可以超过一行。注释不能嵌套。
```
/*...*/
```

请认真阅读 Cminusf 的词法。其词法特性相比 C 语言做了大量简化，比如标识符 student_id 在 C 语言中是合法的，但是在 Cminusf 中是不合法的。

实验内容¶

本部分需要各位同学在 src/parser/lexical_analyzer.l 文件中根据 Cminusf 的词法规则完成词法分析器。在 lexical_analyzer.l 文件中，你只需在规则部分补全模式和动作即可，能够输出识别出的 token，text，line(刚出现的行数)，pos_start(该行开始位置)，pos_end(结束的位置，不包含)。比如：

文本输入：

void main(void) { return; }

则识别结果应该如下：

Token         Text      Line    Column (Start,End)
282           void      1       (1,5)
284           main      1       (6,10)
272              (      1       (10,11)
282           void      1       (11,15)
273              )      1       (15,16)
276              {      1       (17,18)
281         return      1       (19,25)
270              ;      1       (25,26)
277              }      1       (27,28)

必须维护正确的：token、text

选择维护的（方便 debug，测试）：line、pos_start、pos_end

请注意以下几点：

在补全 lexical_analyzer.l 前，你需要在 src/parser/syntax_analyzer.y 中补全 %union 的相关内容；
在补全 lexical_analyzer.l 前，需要首先修改 src/parser/syntax_analyzer.y 文件对 token 进行定义，定义方式形如 %token <node> ERROR；
token 编号是程序自动生成的，根据 token 定义顺序不同，输出的 token 编号也可能不同，是正常现象；
对于部分 token，只需被识别，但不应该被输出到分析结果中，因为这些 token 对程序运行没有作用。

语法分析器¶

Cminusf 语法¶

本小节将给出 Cminusf 的语法，我们将 Cminusf 的所有规则分为五类。

字面量、关键字、运算符与标识符
- type-specifier
- relop
- addop
- mulop
声明
- declaration-list
- declaration
- var-declaration
- fun-declaration
- local-declarations
语句
- compound-stmt
- statement-list
- statement
- expression-stmt
- iteration-stmt
- selection-stmt
- return-stmt
表达式
- expression
- simple-expression
- var
- additive-expression
- term
- factor
- integer
- float
- call
其他
- params
- param-list
- param
- args
- arg-list

起始符号是 program。文法中用到的 token 均以下划线和粗体标出。

$\text{program} \rightarrow \text{declaration-list}$
$\text{declaration-list} \rightarrow \text{declaration-list}\ \text{declaration}\ |\ \text{declaration}$
$\text{declaration} \rightarrow \text{var-declaration}\ |\ \text{fun-declaration}$
$\text{var-declaration}\ \rightarrow \text{type-specifier}\ \underline{\textbf{ID}}\ \underline{\textbf{;}}\ |\ \text{type-specifier}\ \underline{\textbf{ID}}\ \underline{\textbf{[}}\ \underline{\textbf{INTEGER}}\ \underline{\textbf{]}}\ \underline{\textbf{;}}$
$\text{type-specifier} \rightarrow \underline{\textbf{int}}\ |\ \underline{\textbf{float}}\ |\ \underline{\textbf{void}}$
$\text{fun-declaration} \rightarrow \text{type-specifier}\ \underline{\textbf{ID}}\ \underline{\textbf{(}}\ \text{params}\ \underline{\textbf{)}}\ \text{compound-stmt}$
$\text{params} \rightarrow \text{param-list}\ |\ \underline{\textbf{void}}$
$\text{param-list} \rightarrow \text{param-list}\ \underline{\textbf{,}}\ \text{param}\ |\ \text{param}$
$\text{param} \rightarrow \text{type-specifier}\ \underline{\textbf{ID}}\ |\ \text{type-specifier}\ \underline{\textbf{ID}}\ \underline{\textbf{[}}\ \underline{\textbf{]}}$
$\text{compound-stmt} \rightarrow \underline{\textbf{\{}}\ \text{local-declarations}\ \text{statement-list}\ \underline{\textbf{\}}}$
$\text{local-declarations} \rightarrow \text{local-declarations var-declaration}\ |\ \text{empty}$
$\text{statement-list} \rightarrow \text{statement-list}\ \text{statement}\ |\ \text{empty}$
$\begin{aligned}\text{statement} \rightarrow\ &\text{expression-stmt}\\ &|\ \text{compound-stmt}\\ &|\ \text{selection-stmt}\\ &|\ \text{iteration-stmt}\\ &|\ \text{return-stmt}\end{aligned}$
$\text{expression-stmt} \rightarrow \text{expression}\ \underline{\textbf{;}}\ |\ \underline{\textbf{;}}$
$\begin{aligned}\text{selection-stmt} \rightarrow\ &\underline{\textbf{if}}\ \underline{\textbf{(}}\ \text{expression}\ \underline{\textbf{)}}\ \text{statement}\\ &|\ \underline{\textbf{if}}\ \underline{\textbf{(}}\ \text{expression}\ \underline{\textbf{)}}\ \text{statement}\ \underline{\textbf{else}}\ \text{statement}\end{aligned}$
$\text{iteration-stmt} \rightarrow \underline{\textbf{while}}\ \underline{\textbf{(}}\ \text{expression}\ \underline{\textbf{)}}\ \text{statement}$
$\text{return-stmt} \rightarrow \underline{\textbf{return}}\ \underline{\textbf{;}}\ |\ \underline{\textbf{return}}\ \text{expression}\ \underline{\textbf{;}}$
$\text{expression} \rightarrow \text{var}\ \underline{\textbf{=}}\ \text{expression}\ |\ \text{simple-expression}$
$\text{var} \rightarrow \underline{\textbf{ID}}\ |\ \underline{\textbf{ID}}\ \underline{\textbf{[}}\ \text{expression} \underline{\textbf{]}}$
$\text{simple-expression} \rightarrow \text{additive-expression}\ \text{relop}\ \text{additive-expression}\ |\ \text{additive-expression}$
$\text{relop}\ \rightarrow \underline{\textbf{<=}}\ |\ \underline{\textbf{<}}\ |\ \underline{\textbf{>}}\ |\ \underline{\textbf{>=}}\ |\ \underline{\textbf{==}}\ |\ \underline{\textbf{!=}}$
$\text{additive-expression} \rightarrow \text{additive-expression}\ \text{addop}\ \text{term}\ |\ \text{term}$
$\text{addop} \rightarrow \underline{\textbf{+}}\ |\ \underline{\textbf{-}}$
$\text{term} \rightarrow \text{term}\ \text{mulop}\ \text{factor}\ |\ \text{factor}$
$\text{mulop} \rightarrow \underline{\textbf{*}}\ |\ \underline{\textbf{/}}$
$\text{factor} \rightarrow \underline{\textbf{(}}\ \text{expression}\ \underline{\textbf{)}}\ |\ \text{var}\ |\ \text{call}\ |\ \text{integer}\ |\ \text{float}$
$\text{integer} \rightarrow \underline{\textbf{INTEGER}}$
$\text{float} \rightarrow \underline{\textbf{FLOATPOINT}}$
$\text{call} \rightarrow \underline{\textbf{ID}}\ \underline{\textbf{(}}\ \text{args} \underline{\textbf{)}}$
$\text{args} \rightarrow \text{arg-list}\ |\ \text{empty}$
$\text{arg-list} \rightarrow \text{arg-list}\ \underline{\textbf{,}}\ \text{expression}\ |\ \text{expression}$

实验内容¶

本部分需要同学们完成 src/parser/syntax_analyzer.y。与词法分析器相同，你只需要补全代码中规则部分即可。

如果实现正确，该语法分析器可以从 Cminusf 代码分析得到一颗语法树。例如输入

int main(void) { return 0; }

可以得到如下语法树

>--+ program
|  >--+ declaration-list
|  |  >--+ declaration
|  |  |  >--+ fun-declaration
|  |  |  |  >--+ type-specifier
|  |  |  |  |  >--* int
|  |  |  |  >--* main
|  |  |  |  >--* (
|  |  |  |  >--+ params
|  |  |  |  |  >--* void
|  |  |  |  >--* )
|  |  |  |  >--+ compound-stmt
|  |  |  |  |  >--* {
|  |  |  |  |  >--+ local-declarations
|  |  |  |  |  |  >--* epsilon
|  |  |  |  |  >--+ statement-list
|  |  |  |  |  |  >--+ statement-list
|  |  |  |  |  |  |  >--* epsilon
|  |  |  |  |  |  >--+ statement
|  |  |  |  |  |  |  >--+ return-stmt
|  |  |  |  |  |  |  |  >--* return
|  |  |  |  |  |  |  |  >--+ expression
|  |  |  |  |  |  |  |  |  >--+ simple-expression
|  |  |  |  |  |  |  |  |  |  >--+ additive-expression
|  |  |  |  |  |  |  |  |  |  |  >--+ term
|  |  |  |  |  |  |  |  |  |  |  |  >--+ factor
|  |  |  |  |  |  |  |  |  |  |  |  |  >--+ integer
|  |  |  |  |  |  |  |  |  |  |  |  |  |  >--* 0
|  |  |  |  |  |  |  |  >--* ;
|  |  |  |  |  >--* }

这一部分必须严格遵守我们给出的语法，输出必须与标准程序输出完全一致。

实验要求¶

仓库目录结构¶

与本次实验有关的文件如下。

|-- CMakeLists.txt
|-- build # 在编译过程中产生，不需要通过 git add 添加到仓库中
|-- src
|   |-- CMakeLists.txt
|   |-- common
|   `-- parser
|       |-- CMakeLists.txt
|       |-- lexer.c
|       |-- lexical_analyzer.l # 你需要修改本文件
|       |-- parser.c
|       `-- syntax_analyzer.y # 你需要修改本文件
`-- tests
    |-- 1-parser
    |   |-- input # 针对 Lab1 的测试样例
    |   |-- output_standard # 助教提供的标准参考结果
    |   |-- output_student # 测试脚本产生的你解析后的结果
    |   |-- cleanup.sh
    |   `-- eval_lab1.sh # 测试用的脚本
    `-- testcases_general # 整个课程所用到的测试样例

编译、运行和评测¶

首先将你的实验仓库克隆的本地虚拟机中。要编译和运行词法分析器，请按照以下步骤在本地虚拟机中进行操作：

编译

$ cd 2023ustc-jianmu-compiler
$ mkdir build
$ cd build
# 使用 cmake 生成 makefile 等文件
$ cmake ..
# 使用 make 进行编译
$ make

如果构建成功，你会在 build 文件夹下找到 lexer 和 parser 可执行文件，用于对 Cminusf 文件进行词法和语法分析。

$ ls lexer parser
lexer parser

$ ./lexer
usage: lexer input_file

运行

我们在 tests/testcases_general 文件夹中准备了一些通用案例。

# 返回 2023ustc-jianmu-compiler 的根目录
$ cd ${WORKSPACE}

# 运行 lexer，进行词法分析
$ ./build/lexer tests/testcases_general/1-return.cminus
Token         Text      Line    Column (Start,End)
282           void      1       (1,5)
284           main      1       (6,10)
272              (      1       (10,11)
282           void      1       (11,15)
273              )      1       (15,16)
276              {      1       (17,18)
281         return      1       (19,25)
270              ;      1       (25,26)
277              }      1       (27,28)

# 运行 parser，解析 1-return.cminus 输出语法树
$ ./build/parser ./tests/testcases_general/1-return.cminus
>--+ program
|  >--+ declaration-list
|  |  >--+ declaration
|  |  |  >--+ fun-declaration
|  |  |  |  >--+ type-specifier
|  |  |  |  |  >--* void
|  |  |  |  >--* main
|  |  |  |  >--* (
|  |  |  |  >--+ params
|  |  |  |  |  >--* void
|  |  |  |  >--* )
|  |  |  |  >--+ compound-stmt
|  |  |  |  |  >--* {
|  |  |  |  |  >--+ local-declarations
|  |  |  |  |  |  >--* epsilon
|  |  |  |  |  >--+ statement-list
|  |  |  |  |  |  >--+ statement-list
|  |  |  |  |  |  |  >--* epsilon
|  |  |  |  |  |  >--+ statement
|  |  |  |  |  |  |  >--+ return-stmt
|  |  |  |  |  |  |  |  >--* return
|  |  |  |  |  |  |  |  >--* ;
|  |  |  |  |  >--* }

测试

测试样例分为两个部分，分别是 tests/testcases_general 和 lab1 限定的 tests/1-parser。

我们重点使用 tests/1-parser 考察语法分析器的正确性。其结构如下：

.
|-- input # 输入目录，包含 xxx.cminus 文件
|   |-- easy
|   |-- hard
|   `-- normal
|-- output_standard # 助教标准输出目录，包含 xxx.syntax_tree 文件
|   |-- easy
|   |-- normal
|   |-- hard
|   `-- testcases_general
|-- output_student # 学生输出目录，测试过程中产生 xxx.syntax_tree 文件
|   |-- easy
|   |-- hard
|   |-- normal
|   `-- testcases_general
|-- eval_lab1.sh # 测试脚本
`-- cleanup.sh

我们使用 diff 命令进行结果比较：

$ cd 2023ustc-jianmu-compiler
$ export PATH="$(realpath ./build):$PATH"
$ cd tests/1-parser
$ parser input/normal/local-decl.cminus > output_student/normal/local-decl.syntax_tree
$ diff output_student/normal/local-decl.syntax_tree output_standard/normal/local-decl.syntax_tree
[输出为空，代表没有区别，该测试通过]

我们提供了 eval_lab1.sh 脚本进行快速批量测试。该脚本的第一个参数可以是 easy、 normal、 hard 以及 testcases_general，并且有第二个可选参数，用于批量 diff 和助教提供的标准参考结果进行比较。脚本运行后会将生成结果放在 tests/1-parser/output_student 文件夹里，而助教的参考输出则在 tests/1-parser/output_standard 中。

$ ./eval_lab1.sh easy
[info] Analyzing FAIL_comment.cminus
error at line 1 column 4: syntax error
...
[info] Analyzing id.cminus

$ ./eval_lab1.sh easy yes
...
[info] Analyzing FAIL_comment.cminus
error at line 1 column 4: syntax error
...
[info] Comparing...
[info] No difference! Congratulations!