[squeak-dev] Xtreams up to date

mkobetic at gmail.com mkobetic at gmail.com
Sun Jan 23 01:22:08 UTC 2011


"Hannes Hirzel"<hannes.hirzel at gmail.com> wrote:
> If I explore the resulting object I got what is shown in the screen
> shot. I am not sure how to interpret the result.

Yes, it's not the best possible representation, but it does show the full parse tree. The tree can be rather complicated depending on how complicated is the grammar. That's why I suggested playing with simpler grammars. But the reason why I even mentioned the TreeBuilder is that it's the simplest possible Actor, it implements single method that packages the result of each successful production as it's own node. Maybe using an array instead of associations would create a slightly more "tree like" result.

> "Regarding the wiki example: the class PEGWikiGenerator is not included"
> 
>         wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
> 
> Where can I get this class?

I've attached a fileout of the class that you can file into Squeak, except it references VW XML classes, so it won't work. If you fix it up to use the Squeak XML stuff you should be able to get it to work. All you really need is XML.Element and XML.Tag equivalent. The second fileout shows how the class variables are initialized in VW. You might need to turn the initializers into equivalent class side initialize methods.

> As class methods of PEGParser there are 8 grammars included - CSS3,
> HttpUrl, JSON, JavaScript, PEG, Smalltalk, Wiki, XML.
> 
> I think a simple complete example which actually does something would
> be helpful for me. TreeBuilder  seems to do nothing and I do not see the tree.

OK, let's try something really simple. The wikipedia page shows a simple arithmetic expression grammar. Here it is transcribed for Xtreams:

	grammar := '
		Number <- [0-9]+
		Parenthesised <- "(" Expr ")"
		Value <- Number / Parenthesised
		Product	<- Value (("*" / "/") Value)*
		Sum <- Product (("+" / "-") Product)*
		Expr <- Sum'.

I expanded the Value from the wiki grammar into separate Number and Parethesised expression productions. It'll hopefully be clear why shortly. You already know how to get a parser from grammar. That's always the same and for anything more complex you'll probably want to cache the parser instead of recreating it every time, but this will do fine for now:

	parser := PEG.Parser parserPEG parse: 'Grammar' stream: grammar actor: PEG.ParserParser new.

With this you can already parse expressions (note that the grammar as it is written cannot handle whitespace)

	parser parse: 'Expr' stream: '3+4*(10-4)' reading actor: nil

you'll get this (roughly formatted to provide at least some resemblance of the expression tree):

	#(	#(OrderedCollection ($3 "16r0033") OrderedCollection ())
		OrderedCollection (#
			('+' #(OrderedCollection ($4 "16r0034")
			OrderedCollection (#
				('*' #(
					'(' #(	#(OrderedCollection ($1 "16r0031" $0 "16r0030") OrderedCollection ())
							OrderedCollection (#
							('-' #(OrderedCollection ($4 "16r0034") OrderedCollection ())))) ')')))))))

That's obviously not very easy to take apart to work with, that's why you want an Actor that will receive rule specific callbacks and do whatever. Let's say we want to evaluate the expression and return the result. We'll make a subclass of Actor called Evaluator. As I mentioned earlier, the Actor class provides pragma based mechanism for expressing rule specific callbacks so you simply write a method for each rule for which you want to do something and annotate it with a pragma. Let's first try to turn the digit collections into numbers:

Evaluator>>Number: digits
	<action: 'Number'>
	^digits inject: 0 into: [ :total :digit | total * 10 + ('0123456789' indexOf: digit) - 1 ]


With this you can parse numbers

	parser parse: 'Expr' stream: '3456' reading actor: Evaluator new

Now let's try to add. We'll want a method for the Sum production but there's a slightly tricky part there. Note that the rule is defined as 

	Sum <- Product (("+" / "-") Product)*

which means that you get a term (Product) and then zero or more pairs of + or - and another term. What the method receives will be collection of the first term and then pairs of the operator with its term. So parsing this:

	parser parse: 'Expr' stream: '3+4+5+6+7' reading actor: Evaluator new

will give you

	#(3 OrderedCollection (#('+' 4) #('+' 5) #('+' 6) #('+' 7)))

So if you add a method for Sum, that's what it will need to deal with:

Sum: terms
	<action: 'Sum'>
	^terms last inject: terms first into: [ :total :next |
		(next first = '+') ifTrue: [ total + next last ] ifFalse: [ total - next last ] ]

The Actor pragmas provide another convenience for taking more complex input apart. You can write the above as

Sum: first rest: pairs
	<action: 'Sum' arguments: #(1 2)>
	^pairs inject: first into: [ :total :pair |
		(pair first = '+') ifTrue: [ total + pair last ] ifFalse: [ total - pair last ] ]

The arguments: keyword simply says invoke this method with elements 1 and 2 of the rule input (terms) as arguments. Obviously the method must take the corresponding number of arguments as well. So with the above rules the addition expression should give you 25. The Product rule handling is analogous:

Product: first rest: pairs
	<action: 'Product' arguments: #(1 2)>
	^pairs inject: first into: [ :total :pair |
		(pair first = '*') ifTrue: [ total * pair last ] ifFalse: [ total / pair last ] ]

Finally the parenthesised expression is rather simple:

Parenthesised: expression
	<action: 'Parenthesised' arguments: #(2)>
	^expression

It receives a three element array: the left bracket, the expression inside and the right bracket. All we really need to do is return the expression in the middle. The above is using the arguments: pragma again to achieve that, the following would be equivalent:

Parenthesised: expression
	<action: 'Parenthesised'>
	^expression at: 2

I hope by now it's also clearer why it was useful to factor the Value definition out into separate Number and Parenthesised rule. Otherwise we'd have to handle both cases in a single Value: method, which would make the code messier.

> For example besides the classes PEGParserPEGTest and
> PEGParserSmalltalkTest (6 test cases) a class PEGParserHttpUrlTest or
> PEGParserJavaScriptTest would be helpful.

Yup, the tests are seriously wanting as well. Hopefully we'll get there soon. BTW, the parsing stuff doesn't work for me out of the box in either Squeak or Pharo in the images that I use (which should be quite clean). The problem I'm seeing is that the Dictionary>>addAll: in ParserParser>>Grammar: doesn't work as expected. It's trying to add collection of associations to a dictionary but ends up adding them as values indexed by integers. I can fix it but I wonder what do people have loaded that makes it work for them. 

Anyway, hopefully the above can get you going.

Cheers,

Martin



More information about the Squeak-dev mailing list