Symbol attributes
Symbol attributes can be attached to a left- or right-hand side
symbol. The priority attributes (> + ^) and the
exclusive attribute (!) can be repeated. The first
exclusive attribute must follow a priority attribute.
Lexical state
The lexical state attribute sets the lexical state of the symbol if
follows. The lexical state is a number, zero or greater.
A lexical state is calculated at each parsing state. When the parser
requires the next token, the lexer is passed the lexical state of the
current parsing state. The lexical state can be used by the lexer to
tokenize context sensitive tokens.
If a token T has as lexical state L, then the lexer is passed L when T
could be the next token. If a right-hand side nonterminal N has a
lexical state L, then the lexer is passed L when any token in FIRST(N)
could be the next token. If the left-hand side nonterminal N of a
rule R has a lexical state L, then the lexer is passed L when any
token in FIRST(R) could be the next token, where FIRST(R) is the set
of tokens that can start R. The lexer is passed the lexical state 0
if none of these conditions are true. It is an error if at any time
there could be two or more possible lexical states.
Consider the grammar below.
start -> a B 1 a B 2
a -> A
a ->
The lexer is passed the lexical state 1 when the first B
could be the next token, and the lexical state 2 when the second
B could be the next token.
The lexical state of a parsing state, if non-zero, is shown in the log
file following the state number in parenthesis.
Priorities
basil generates a list of actions for each lookahead token in a
parsing state. When the parser reaches a point where there are
multiple actions for the next token it tries them in order. If a try
leads to a syntax error, possibly caused by a semantic predicate, then
the parser backtracks and tries the next one. If a syntax error
occurs and no pending actions remain, then the parser reports a syntax
error and possibly recovers.
The priority attributes are used to order the actions in a list. The
exclusive attribute is used to exclude one or more actions from a
list.
An action has two properties: a priority and an exclusive priority.
These properties are computed from the attributes attached to the
symbols (you will see how they are computed below). The actions in a
list are ordered with respect to their priority; the action with the
highest priority is first. Once the actions are ordered, an action
and all remaining actions are discarded if the action has a lower
exclusive priority than the one preceding it.
Here is an example. Suppose for some lookahead token T
three actions are possible: ACTION_A,
ACTION_B and ACTION_C. Let the priority and
exclusive priority of these actions be the following:
| Action | Priority | Exclusive Priority |
ACTION_A | 1 | 1 |
ACTION_B | 2 | 1 |
ACTION_C | 0 | 0 |
The action list on token T would be:
T - ACTION_B, ACTION_A
Before we can see how the priority of an action is computed, we need
to define a few terms:
- propagated shift priority
-
This is a property of a rule. It is computed for nonkernel rules
only. This property stays with the rule as the dot moves to the right
(as the rule becomes a kernel rule). A start rule has no propagated
shift priority.
The propagated shift priority of a rule is the maximum effective shift
priority of the left-hand side symbol, considering all contexts. For
example, considering only the rules below, the propagated shift
priority of the third rule is 2.
a -> d.e > f
b > -> d.e >
>> e ->.E
The effective shift priority of e in the first and second
rule is 1 and 2 respectively. The propagated shift priority of the
third rule is 2, the maximum value. The propagated shift priority is
shown preceding the rule.
If the command line option -ss is specified, then the
propagated shift priorities are visible in the log file.
- effective shift priority
-
This is a property of a right-hand side symbol. It is computed for
the symbol following the dot in kernel and nonkernel rules. The
effective shift priority of a symbol is computed using this procedure:
- Let
R be a rule.
- Let
L be the left-hand side symbol of R.
- Let
N be the symbol after the dot in R.
The effective shift priority of N is the sum of:
- The propagated shift priority of
R.
- The shift priority of
L.
- The shift priority of
N.
For example, the effective shift priority of A in the
rule below is 3.
> a > ->.A >
- effective first priority
-
This is a property of a token. It is computed for all tokens that can
follow the symbol after the dot in kernel and nonkernel rules where
the symbol after the dot is a nonterminal. For example,
E is such a token in the rule below if
FIRST(d) contains epsilon.
a -> b.c d E
The effective first priority of a token is computed using this
procedure:
- Let
R be a rule.
- Let
L be the left-hand side symbol of R.
- Let
N be the symbol after the dot in R, and let N be a nonterminal.
- Let
T be a token that can follow N.
The effective first priority of T is the sum of:
- The propagated shift priority of
R.
- The shift priority of
L.
- The reduce priority of
N.
- The first priority of
T.
For example, the effective first priority of E in
the rule below is 4 if FIRST(d) contains epsilon.
> a > -> b.c + d E ^
- propagated first priority
-
This is a property of a follow item. A follow item is an item in the
follow set of a rule. A follow item is composed of two symbols: a
token and a nonterminal. It is printed in this form:
T.n
If T.n is an item in the follow set of a rule, then
it indicates there is a reduction on T when the dot is
after the last symbol in the rule. Furthermore, the nonterminal n
is followed (via a GOTO) after the reduction. The nonterminal n
is called the shortcut (n is not necessarily the
left-hand side symbol of the rule).
The propagated first priority of a follow item is the maximum
effective first priority of the token, considering all contexts. For
example, considering only the rules below, the propagated first
priority of E.d in the follow set of the third rule
is 3.
> a -> b.d + E ^
> c -> b.d E ^
d -> D [E.d^^^]
The propagated first priority is shown following the follow item.
If the command line option -sf is specified, then the
follow sets are visible in the log file.
Using these terms we can now see how the priority of an action is
computed.
A SHIFT action on lookahead token T is present in a state
when T follows the dot in some rule. The
priority of the action is the maximum effective shift priority of
T, considering all rules where T follows the
dot.
A SHIFT action on lookahead token T is shown in the log
file in the form
T - SHIFT S
where S is the state number.
A REDUCE action on lookahead token T is present in a
state when T is the token in a follow item in the
follow set of a rule, and the dot is after the last symbol on
the right-hand side of the rule. The priority of the action
is the reduce priority of the left-hand side symbol
of the rule plus the propagated first priority of the follow item.
A REDUCE action on lookahead token T is shown in the
log file in the form
T - REDUCE R n
where R is the number of the rule to reduce and
n is the shortcut.
The exclusive priority of an action is computed in the same manner
with the following understanding:
- An exclusive attribute following a shift priority attribute is a shift exclusive priority.
- An exclusive attribute following a reduce priority attribute is a reduce exclusive priority.
- An exclusive attribute following a first priority attribute is a first exclusive priority.
Sticky follow
The sticky follow attribute sets the sticky property of the symbol it
follows. If a symbol is sticky, then any REDUCE action with the
symbol as the shortcut will be not be a candidate when computing the
default action list in a state.
To put it another way, the parser follows a sticky symbol (via a GOTO)
only when a valid token is next.
Consider this example:
start -> a b
a -> R
a -> S
b -> T
b -> U
As we can see in parsing states 1 and 2, the tokens R and S
are reduced to a on any lookahead token:
1
-----
a -> R. (1) [T.a U.a]
* - REDUCE 1 a
2
-----
a -> S. (2) [T.a U.a]
* - REDUCE 2 a
However, by making a sticky,
start -> a < b
a -> R
a -> S
b -> T
b -> U
R and S are reduced to a only
when a valid token is next:
1
-----
a -> R. (1) [T.a< U.a<]
T - REDUCE 1 a
U - REDUCE 1 a
2
-----
a -> S. (2) [T.a< U.a<]
T - REDUCE 2 a
U - REDUCE 2 a
The sticky attribute on a follow item indicates that the shortcut is
sticky. The follow item EOT.start in a start rule is
always sticky:
start ->.a b [EOT.start<]
Hence a start rule is reduced only when EOT (the end token) is next.
If the sticky attribute is attached to the left-hand side symbol in a
rule, then the rule is reduced only when a valid token is next. For
example, in the grammar below, R, but not S,
is reduced to a only when a valid token is next:
start -> a b
a < -> R
a -> S
b -> T
b -> U
States 1 and 2 are:
1
-----
a < -> R. (1) [T.a U.a]
T - REDUCE 1 a
U - REDUCE 1 a
2
-----
a -> S. (2) [T.a U.a]
* - REDUCE 2 a
Accept
The accept attribute sets the accept property of the symbol if
follows. If a rule is reduced and an accepting symbol is followed
(via a GOTO), then all pending parsers formed during the parsing of
the rule are discarded. A pending parser is formed before an action
is executed if one or more alternate actions are possible. If the
parser encounters a syntax error, possibly caused by a semantic
predicate, the parser will backtrack to the last-formed pending
parser.
An ACCEPT action indicates to the parser that an accepting symbol is
being followed. In all other ways it is similar to a REDUCE action.
An ACCEPT action on lookahead token T is shown in the log
file in the form
T - ACCEPT R n
where R is the number of the rule to reduce and
n is the shortcut.
If an accept action discards all pending parsers, then the event is
called a full accept. This is important during syntax error
recovery. Syntax recovery begins from the point of the last full
accept. Since the accept action serves two purposes, a grammar will
have accepting symbols even if it does not require backtracking.
The accept attribute if often used with the sticky follow attribute.
The rule below, for example, will be reduced only if a valid
token is next. The parser will also discard all pending parsers
formed during the parsing of the rule.
a <* -> b c d
|