Syntax Analyser - A Tcl Parser Generator

written May 2006
Google site search

ContentsIntroductionBasicIntermediateAdvancedFuturePolicyInfrastructure
You need the Tcl Plugin to read this page. If this spot  is not green (v2) or cyan (v3)you may want to download the latest Tcl Plugin.

3.0.1   Analyser        

[analyse] generates two tables, the first for looking ahead one token and the second for looking ahead two tokens. The scanner/parser uses these two tables as required to make its parsing decisions. If the input grammar is simple enough not to require the use of the second table then a pruned form of the parser can be used, that only uses one table. This ability to minimise the size of the parser is useful compared to the "fixed`" results produced by the standard LALR(1) compiler generators commonly available.

3.0.1.1   The Parser        

This parser acts like an interpreter. This is because it is executed under the aegis of Tcl which is itself an intepreter. It doesnt make much sense to use an interpretive environment to write a compiler,( that is a program that reads the whole program before executing any statements). In any case out application only requies the parser to deal with the expression found within a more general Tcl command.

The parser requires certain tables to be provided to do its job. We work backwards from the parser to determine what the analyser must generate, so first we must describe the parser, which is effectively the output from the analyser.

The parser parses a sentence to either return the goal symbol, or else report an error in the grammar. Since every sentence has a unique canonical parse (there is a mathematical theorem about this), the canonical parse can be broken into a sucession of parse steps by applying the Canonical Parse Step function. Thus we have the following structure:

	set text
	while text is not the goal {
		apply canonical parse step function {
			if no valid reduction return error
		}
	}
	return goal
The parse step function steps thru the text, left to right (stacking). It processes the accrued left string (the parse stack) up to and including the terminal by reducing the left string as much as possible. It does this using the context of the next terminal (right context) in the text where necessary.

There are three processes here.

These three decisions can be made in the context of the substrings rather than the whole parse tree or the whole of the remaining text.

To do this job the parser/scanner is provided with two tables which describe the valid contexts of any token and the reduction rule which applies within each context. This means that the parser is a fixed program, rather than one generated by the analyser. That is to say the one one parser prgram is used to parse any grammar. The only difference is in the tables which the parser uses to make its decisions.

This makes things quite convenient, because you do not have to have different versions of a parser program for different grammars, just supply different input in the form of the tables.

We now take a leap forward and assume that the integer grammar has been processed by the analyser to produce the tables. A detailed look at the workings of the analyser is yet to be written up here.

We have a scanner that can look at an integer expression and verify it is in a correct grammar and thus that it has a particular value. But we have not told the scanner how to produce the value. We need to supply rues for this. These rules are the semantic rules for the grammar.

As the parser recognises that a rule in the grammar can be reduced we need to tell the parser how to construct the resulting value of the reduced nonterminal from its component values. We do this by attaching a Tcl procedure to each rule which defines this process. Here is the resulting grammar:

# Syntax for integer procedure
analyze {
rule	integer	  expr 			{ return $0 }
rule 	expr 	{ add_list  - sub_list }	{ return [expr {$0 - $2}] }	
rule 	expr 	  add_list 			{ return $0}	
rule 	expr 	{ item - sublist }		{ return [expr {$0 - $2} ] }
rule 	expr 	{ mult_list / div_list }		{ return [expr {$0 / $2}] }
rule 	expr 	{ mult_list }		{ return $0 }
rule 	expr 	{ item / div_list }		{ return [expr {$0 / $2 } ] }
rule 	expr 	{ item % item }		{ return [expr {$0 % $2 } ] }
rule 	expr 	  item 			{ return $0 }
rule 	add_list  	{ add_head item } 		{ return [expr { $0 + $1 } ] }
rule 	add_head { add_head item + }	{ return $0 }
rule 	add_head { item + }			{ return $0 }
rule 	sub_list 	{ sub_head item }		{ return [expr { $0 + $1 } ] }
rule 	sub_list 	 item 			{ return $0 }
rule 	sub_head { sub_head item - }		{ return [expr {$0 + $1 } ] }
rule 	sub_head { item - }			{ return $0 }
rule 	mult_list 	{ mult_head item }		{ return [expr {$0 * $1 } ] }
rule 	mult_head  { mult_head item * }	{ return [expr {$0 * $1 }]}
rule 	mult_head  { item * }			{ return $0 }
rule 	div_list 	{ div_head item }		{ return [expr {$0 * $1 } ] }
rule 	div_list 	{ item }			{ return $0 }
rule 	div_head { div_head item / }		{ return [expr {$0 + $1}]}
rule 	div_head { item / }			{ return $0 }
rule 	item 	INTEGER		{ return $0 }
rule 	item 	{ ( expr ) }			{ return $1 }
}
The rule procedure now takes 3 arguments, the nonterminal lefthand, a list of words for the righthand side, and a body of commands with parameters which refer to the positions of words in the righthand side. As you can see, each semantic rule is a fairly literal implementation of the intended meaning of the rule. But we do not want to use the [expr] command to calculate the values, we want to use our bignum integer library procedures. So instead the semantic rules look like this:
# Syntax for integer procedure
analyze {
rule	integer	  expr 			{ return $0 }
rule 	expr 	{ add_list  - sub_list }	{ return [::integer::binop $0 - $2 ] }	
rule 	expr 	  add_list 			{ return $0}	
rule 	expr 	{ item - sublist }		{ return [::integer::binop $0 - $2 ] }
rule 	expr 	{ mult_list / div_list }		{ return [::integer::binop $0 / $2 ] }
rule 	expr 	{ mult_list }		{ return $0 }
rule 	expr 	{ item / div_list }		{ return [::integer::binop $0 / $2 ] }
rule 	expr 	{ item % item }		{ return [::integer::binop $0 % $2 ] }
rule 	expr 	  item 			{ return $0 }
rule 	add_list  	{ add_head item } 		{ return [::integer::binop $0 + $1 ] }
rule 	add_head { add_head item + }		{ return $0 }
rule 	add_head { item + }			{ return $0 }
rule 	sub_list 	{ sub_head item }		{ return [::integer::binop  $0 + $1 ] }
rule 	sub_list 	 item 			{ return $0 }
rule 	sub_head { sub_head item - }		{ return [::integer::binop $0 + $1 ] }
rule 	sub_head { item - }			{ return $0 }
rule 	mult_list 	{ mult_head item }		{ return [::integer::binop $0 * $1 ] }
rule 	mult_head  { mult_head item * }	{ return [::integer::binop $0 * $1 ] }
rule 	mult_head  { item * }		{ return $0 }
rule 	div_list 	{ div_head item }		{ return [::integer::binop $0 * $1 ] }
rule 	div_list 	{ item }			{ return $0 }
rule 	div_head { div_head item / }		{ return [::integer::binop $0 + $1 ]}
rule 	div_head { item / }			{ return $0 }
rule 	item 	INTEGER		{ return $0 }
rule 	item 	{ ( expr ) }			{ return $1 }
}
You can now see how this same grammar can be used for a variety of expression types, simply by changing the namespace ::integer to the namsespace of your new object, be it bignum quotient, polynomial or whatever. The one grammar can be used, all that changes is the library of procedures that represent the semantics of the object.

This concludes the discussion of this topic, which at this point is rather incomplete.
Prev Next

©2000 - 2006 WEBSCOOL This page last updated 13 Jun 2006. All rights reserved - including copying or distribution of any portion of this document in any form or on any medium without authorisation. For more regarding the copyright.