-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathgrammar.txt
More file actions
409 lines (325 loc) · 17 KB
/
grammar.txt
File metadata and controls
409 lines (325 loc) · 17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
About the grammar for micro-C * sestoft@itu.dk * 2009-09-29
-----------------------------------------------------------
Micro-C is a sublanguage of C. By design, a syntactically well-formed
micro-C program is also a syntactically well-formed C program, and the
intention is that the meaning of the program should be the same in
both languages (except that micro-C is untyped and permits assignment
to array variables, which C does not).
Micro-C是C的一个子语言。根据设计,语法结构良好的Micro-C程序也是语法结构良好
的C程序,其目的是程序在两种语言中的含义应该相同(除了Micro-C是非类型化的,
并且允许分配给数组变量,而C不是)。
Many simplifications have been made compared to real C:
* datatypes: only int and char variables, arrays, and pointers
* no structs, unions, doubles, function pointers, ...
* no initializers in variable declarations
* functions can return only int, char, void
与实际C相比,进行了许多简化:
*数据类型:仅int和char变量、数组和指针
*没有结构、并集、双精度、函数指针...
*变量声明中没有初始值设定项
*int只能返回char,void函数
First attempt at a grammar
第一次尝试语法
--------------------------
This first version of the grammar shows what we want the language to
include. However, it is ambiguous and will have to be rewritten for
use in a parser specification (see below). Some grammar rules are
already complex enough to deserve an explanation:
语法的第一个版本显示了我们希望语言包含的内容。然而,它是不明确的,必须重写
才能在解析器规范中使用(见下文)。有些语法规则已经足够复杂,值得解释:
* The variable description vardesc reflects the variable declaration
syntax of C and C++, where the type specification surrounds the
variable; unlike in Standard ML, F#, Java and C#, the type cannot
be isolated syntactically from the variable name.
* 变量描述vardesc反映了C和C++的变量声明语法,其中类型规范围绕着变量;
与标准ML、F#、Java和C#不同,类型在语法上不能与变量名分离。
* The definition of a comma-separated list (of variables or argument
expressions) must be split into three cases:
* 逗号分隔列表(变量或参数表达式)的定义必须分为三种情况:
empty list 空列表
one element (and no comma) 一个元素(并且不是逗号)
more than one element (separated by commas) 多个元素(用逗号分隔)
This is best achieved using two nonterminal symbols, as
illustrated by comma-separated argument expression lists:
这最好使用两个非终结符符号来实现,如逗号分隔的参数表达式列表所示:
exprs ::=
<empty>
expr1
expr1 ::=
expr
expr , expr1
The non-recursive nonterminal exprs distinguishes the empty case
from the non-empty case. The recursive nonterminal expr1
distinguishes the one-element case from the multi-element case.
非递归非终结表达式区分空大小写。递归非终结符expr1区分了单元素情况和多元素情况。
* We distinguish between access expressions (those that have an
lvalue), and general expressions. The address operator (&) can be
applied only to access expressions. In an array element access
e1[e2], expression e1 must be an access expression.
* 我们区分访问表达式(具有左值的表达式)和一般表达式。地址运算符(&)只能
应用于access表达式。在数组元素访问e1[e2]中,表达式e1必须是访问表达式。
main ::= program
topdecs EOF
topdecs ::= top-level declarations
<empty>
topdec topdecs
topdec ::= top-level declaration
vardec ; global variable declaration
fundec function declaration
vardec ::= variable or parameter declaration
typ vardesc
vardesc ::= variable description
NAME variable
* vardesc pointer to
( vardesc ) parenthesized variable description
vardesc [ ] array of
vardesc [ int ] array of given size
fundec ::= function declaration
void NAME ( paramdecs ) block
typ NAME ( paramdecs ) block
paramdecs ::= comma-separated parameter list
<empty>
paramdecs1
paramdecs1 ::= non-empty parameter declaration list
vardec
vardec , paramdecs1
stmt ::=
if (expr) stmt if-statement
if (expr) stmt else stmt if-else statement
while (expr) stmt while-loop
expr ; expression as statement
return ; return
return expr ; return
block block statement
block ::= block statement
{ stmtordecseq }
stmtordecseq ::= statements and declarations
<empty> empty sequence
stmt stmtordecseq statement
vardec ; stmtordecseq local variable declaration
expr ::=
access access
access = expr assignment
const constant literal
NAME ( exprs ) function call
( expr ) parenthesized expression
& access address of
! expr logical negation
print expr print integer expression
expr + expr plus
expr - expr minus
expr * expr times
expr / expr quotient
expr % expr remainder
expr == expr equal to
expr != expr not equal to
expr > expr greater than
expr < expr less than
expr >= expr greater than or equal to
expr <= expr less than or equal to
expr && expr sequential and
expr || expr sequential or
access ::=
NAME local or global variable
* expr pointer dereferencing
access [ expr ] array indexing
exprs ::=
<empty> empty list of expressions
exprs1 non-empty list of expressions
exprs1 ::=
expr list with one expression
expr , exprs1 list with more than one expr
const ::= constant literals
CSTINT integer literal
- CSTINT negative integer
NULL the NULL literal
typ ::=
int integer type
char character type
Second attempt at a grammar, more suitable for a parser specification
第二次尝试语法,更适合解析器规范
---------------------------------------------------------------------
Important changes:
重要的变化:
* We split statements into two kinds: stmtm and stmtu. In the
former, there are no if-statements at top-level without a matching
else-branch. In the latter there maybe if-statements without a
matching else-branch. This `dangling else' problem is discussed in
Mogensen's book. The implication is that
* 我们将语句分为两类:stmtm和stmtu。在前者中,如果没有匹配的else分支,
顶层就没有if语句。在后者中,可能有if语句,但没有匹配的else分支。
在Mogensen书中讨论了这个“悬而未决的问题”。这意味着
if (e1) if (e2) s1 else s2
is parsed the same way as
和下面的解析方式是相同的
if (e1) { if (e2) s1 else s2 }
not as
但不是
if (e1) { if (e2) s1 } else s2
* The grammar would still be ambiguous and cause shift/reduce
conflicts, if we did not use precedence declarations on the
operators that can appear in an expression:
* 如果我们没有对表达式中可能出现的运算符使用优先级声明,语法仍然
是不明确的,并且会导致shift/reduce冲突:
right = /* lowest precedence */
nonassoc PRINT
left ||
left &&
left == !=
nonassoc > < >= <=
left + -
left * / %
nonassoc ! &
nonassoc [ /* highest precedence */
Most of these are quite obvious and can be taken straight from a
textbook on C or Java. For instance, the assignment operator (=)
must bind less strongly than all of the logical and arithmetic
operators, the logical connectives must bind less strongly than the
comparisons, logical or (||) binds less strongly than logical and
(&&), the <, >, <=, >= comparisons must be non-associative, lowest
precedence, and so on.
其中大多数都是显而易见的,可以直接从C或Java教科书中学习。例如,
赋值运算符(=)的绑定强度必须小于所有逻辑和算术运算符,逻辑连接词的绑定强度
必须小于比较,逻辑or(||)的绑定强度必须小于逻辑and(&&),比较必须是
非关联的、优先级最低的,等等。
The high precedence given to the left bracket ([) is necessary to
avoid ambiguity and parse conflicts in expressions and variable
declarations. For expressions it implies that
左括号([)的高优先级是避免表达式和变量声明中的歧义和解析冲突所必需的
* the parsing of &a[2] is &(a[2]), not (&a)[2]
* the parsing of *a[2] is *(a[2]), not (*a)[2]
For variable declarations, it implies that
对于变量声明,它意味着
* the parsing of int *a[10] is int *(a[10]), not int (*a)[10]
The low precedence given to PRINT is necessary to avoid ambiguity
and parse conflicts in expressions with two-argument operators.
It implies that
在使用两个参数运算符的表达式中,为避免歧义和解析冲突,必须为PRINT指定较低
的优先级。这意味着
* the parsing of print 2 + 5 is print (2 + 5), not (print 2) + 5
By introducing extra nonterminals and grammar rules, one can live
without the precedence declarations. Mogensen's book describes how
to do that too.
通过引入额外的非终结符和语法规则,人们可以不用优先级声明。Mogensen的书也描述
了如何做到这一点。
* Including access expressions into general expressions leads to
ambiguity and conflicts. We add a new nonterminal exprnotaccess to
explicitly represent non-access expressions.
* 将访问表达式包含到一般表达式中会导致歧义和冲突。我们添加了一个新的非终端
exprnotaccess来显式表示非访问表达式。
* For non-access expressions, we distinguish atomic expressions (a
variable, a constant, or an expression within parentheses) from
non-atomic ones. This avoids ambiguity in pointer dereferencing
expressions, so that
* 对于非访问表达式,我们区分原子表达式(变量、常量 或 括号内的表达式)和
非原子表达式。这避免了指针解引用表达式中的歧义,因此
* the parsing of *x*2 is (*x)*2, not *(x*2)
main ::=
topdecs EOF program
topdecs ::= top-level declarations
<empty>
topdec topdecs
topdec ::= top-level declaration
vardec ;
fundec
vardec ::= variable or parameter declaration
typ vardesc
vardesc ::= variable description
NAME variable
* vardesc pointer to
( vardesc ) parenthesized variable description
vardesc [ ] array of
vardesc [ int ] array of (with allocation)
fundec ::= function declaration
void NAME ( paramdecs ) block
typ NAME ( paramdecs ) block
paramdecs ::= comma-separated parameter list
<empty>
paramdecs1
paramdecs1 ::= non-empty parameter declaration list
vardec
vardec , paramdecs1
stmt ::= statement
stmtm without unmatched trailing if-else
stmtu with unmatched trailing if-else
stmtm ::= no unmatched trailing if-else
expr ; expression statement
return ; return
return expr ; return
block block statement
if (expr) stmtm else stmtm if-else statement
while (expr) stmtm while-statement
stmtu ::=
if (expr) stmtm else stmtu unmatched trailing if
if (expr) stmt unmatched if
while (expr) stmtu unmatched
block ::= block statement
{ stmtordecseq }
stmtordecseq ::=
<empty> empty sequence
stmt stmtordecseq statement
vardec ; stmtordecseq local variable declaration
expr ::=
access access expression
exprnotaccess other expression types
exprnotaccess ::=
atexprnotaccess atomic expression, not access
access = expr assignment
NAME ( exprs ) function call
! expr logical negation
print expr print expr's value
println print value and newline
expr + expr plus
expr - expr minus
expr * expr times
expr / expr quotient
expr % expr remainder
expr == expr equal to
expr != expr not equal to
expr > expr greater than
expr < expr less than
expr >= expr greater than or equal to
expr <= expr less than or equal to
expr && expr sequential and
expr || expr sequential or
atexprnotaccess ::=
const constant
( exprnotaccess ) parenthesis
& access address operator
access ::=
NAME variable
( access ) parenthesis
* access pointer dereferencing
* atexprnotaccess pointer dereferencing
access [ expr ] array element access
exprs ::=
<empty> empty list of expressions
exprs1 non-empty list of expressions
exprs1 ::=
expr list with one expression
expr , exprs1 list with more than one expr
const ::=
CSTINT non-negative integer constant
- CSTINT negative integer constant
null null constant
type ::=
int the int type
char the char type
Lexical matters: tokens and comments
词汇问题:标记和注释
------------------------------------
NAME: [`a`-`z``A`-`Z`][`a`-`z``A`-`Z``0`-`9`]*
except for the keywords, which are:
除了关键字,它们是:
char else false if int null print println return true void while
CSTINT: [0-9]+
CSTBOOL: false | true
CSTSTRING: `"`...`"`
with C string escapes \a \b \t \n \v \f \r \" \\ \ddd \uxxxx
OP: + - * / % = == != < > <= >= && ||
There are two kinds of comments (not inside strings, not nested):
有两种注释(不在字符串内,也不嵌套):
// comment extends to end of line
注释延伸到行尾
/* ... */ delimited comment
分隔注释