Grammar file format
The grammar file contains a sequence of rules. Each rule can be
preceded by a rule-name.
The pound sign (#) begins a comment; all characters remaining on the
line are ignored. Except when terminating a comment, the newline
character is a whitespace character.
The rule syntax is simple BNF. An example rule is:
A rule is composed of a left-hand side symbol followed zero or more
right-hand side symbols, separated by an arrow. Identifiers in upper
case are tokens. Identifiers not in upper case are nonterminals. An
identifier must begin with a letter; the characters - and
_, and the digits 0 to 9 can be used after the first letter.
expr -> expr PLUS term
Any number of attributes can follow a symbol. The attributes following
a symbol are said to be attached to the symbol.
are the attributes.
|> ||Shift priority|
|+ ||Reduce priority|
|^ ||First priority |
|! ||Exclusive modifier on priority|
|< ||Sticky follow|
|NUMBER ||Lex state|
|(attribs) NUMBER ||Repeat attribs NUMBER of times|
Symbol attributes are described further in the next section.
The file must contain at least one start rule. A rule with
the left-hand side symbol start indicates such a rule. For
example, the rule below is a start rule.
Parsing can begin at any start rule.
start -> expr SEMI
A rule-name must have one of the following forms. Let R be the
rule it precedes.
If a rule is preceded by an empty rule-name, or no rule-name at all,
Defines a node named ClassName. This node will represent
R in the AST. basil generates the ClassName
definition in the output nonterminal lzz file. The class will contain
member functions that return the child nodes of R (the
functions will be named after the right-hand side symbols).
ClassName must be an acceptable C++ identifier.
Like above, except ClassName is considered slippery. A
slippery visitor that does not visit ClassName will
automatically visit the child nonterminal nodes of R, from left
to right. A visitor is slippery if true is passed as the
slippery argument to the base visitor constructor. A visitor is by
default slippery. A bottom-up visitor (a visitor that visits the AST
as its created) is the only visitor you would likely not make
Slippery nodes are specially useful for sequence rules:
A visitor that collects the identifiers here would only have to
id-list -> id
[$ SeqNode] id-list -> id-list COMMA id
[IdNode] id -> IDENT
(This is lzz code. You can also write visitors directly in C++.)
// get identifer list
struct GetIdentList (IdentVector & ident_set)
: NodeVisitor (true)
// id -> IDENT
void visit (IdNode & node) const
ident_set.push_back (node.getIDENT ().getLexeme ());
Names a shared node. ClassName will also represent R in
the AST. ClassName must be defined elsewhere.
Use of shared nodes can reduce the amount of semantic code. Sequence
rules are often candidates for shared nodes:
expr-list -> expr
[(SeqNode)] expr-list -> expr-list COMMA expr
An empty rule-name. This form prevents single-rules from being
bypassed (if the bypass option is enabled). If R is not a
single rule, then this means nothing.
If the rule is an epsilon rule (i.e.: foo ->), then an unset node
represents the rule in the AST. Token and nonterminal nodes contain a
member function to test if the node is set.
If the rule has one right-hand side symbol, then the child node
represents the rule in the AST.
Otherwise, the child nodes are discarded, and an unset node represents
the rule in the AST. In this manner, the AST can be pruned.