Grammar file format
The grammar file contains a sequence of rules. Each rule can be
preceded by a rule-name.
The pound sign (#) begins a comment; all characters remaining on the
line are ignored. Except when terminating a comment, the newline
character is a whitespace character.
Rule format
The rule syntax is simple BNF. An example rule is:
expr -> expr PLUS term
A rule is composed of a left-hand side symbol followed zero or more
right-hand side symbols, separated by an arrow. Identifiers in upper
case are tokens. Identifiers not in upper case are nonterminals. An
identifier must begin with a letter; the characters - and
_, and the digits 0 to 9 can be used after the first letter.
Any number of attributes can follow a symbol. The attributes following
a symbol are said to be attached to the symbol.
The following
are the attributes.
| Attribute | Description |
| > | Shift priority |
| + | Reduce priority |
| ^ | First priority |
| ! | Exclusive modifier on priority |
| < | Sticky follow |
| * | Accept |
| NUMBER | Lex state |
| (attribs) NUMBER | Repeat attribs NUMBER of times |
Symbol attributes are described further in the next section.
The file must contain at least one start rule. A rule with
the left-hand side symbol start indicates such a rule. For
example, the rule below is a start rule.
start -> expr SEMI
Parsing can begin at any start rule.
Rule-name format
A rule-name must have one of the following forms. Let R be the
rule it precedes.
-
[ClassName]
-
Defines a node named ClassName. This node will represent
R in the AST. basil generates the ClassName
definition in the output nonterminal lzz file. The class will contain
member functions that return the child nodes of R (the
functions will be named after the right-hand side symbols).
ClassName must be an acceptable C++ identifier.
-
[$ ClassName]
-
Like above, except ClassName is considered slippery. A
slippery visitor that does not visit ClassName will
automatically visit the child nonterminal nodes of R, from left
to right. A visitor is slippery if true is passed as the
slippery argument to the base visitor constructor. A visitor is by
default slippery. A bottom-up visitor (a visitor that visits the AST
as its created) is the only visitor you would likely not make
slippery.
Slippery nodes are specially useful for sequence rules:
id-list -> id
[$ SeqNode] id-list -> id-list COMMA id
[IdNode] id -> IDENT
A visitor that collects the identifiers here would only have to
visit IdNode:
// get identifer list
struct GetIdentList (IdentVector & ident_set)
: NodeVisitor (true)
{
// id -> IDENT
void visit (IdNode & node) const
{
ident_set.push_back (node.getIDENT ().getLexeme ());
}
}
(This is lzz code. You can also write visitors directly in C++.)
-
[(ClassName)]
-
Names a shared node. ClassName will also represent R in
the AST. ClassName must be defined elsewhere.
Use of shared nodes can reduce the amount of semantic code. Sequence
rules are often candidates for shared nodes:
expr-list -> expr
[(SeqNode)] expr-list -> expr-list COMMA expr
-
[ ]
-
An empty rule-name. This form prevents single-rules from being
bypassed (if the bypass option is enabled). If R is not a
single rule, then this means nothing.
If a rule is preceded by an empty rule-name, or no rule-name at all,
then:
-
If the rule is an epsilon rule (i.e.: foo ->), then an unset node
represents the rule in the AST. Token and nonterminal nodes contain a
member function to test if the node is set.
-
If the rule has one right-hand side symbol, then the child node
represents the rule in the AST.
-
Otherwise, the child nodes are discarded, and an unset node represents
the rule in the AST. In this manner, the AST can be pruned.
|