[Newcompiler] Properties for AST

Mon Apr 9 08:23:34 UTC 2007

On 09.04.2007, at 09:00, Stéphane Ducasse wrote:

> Hi
>
> I have a question is the information retained useful for other tools.
>

In case of the Tokens, I think we don't need them later. The whole  
idea of providing a high-level
representation is to be able to throw away the low level one. What we  
need from the Tokens
is meta data: comments and possitions in source. The position data we  
then can flush if it is identical
to the one the pretty printer would re-create. This way, we don't  
need any position/whitespace data
on the AST of  any methods of the system other then newly created  
ones by people not using the
pretty printer.

On the other hand, information from semantic analysis is indeed  
useful later (e.g. which variables with
the same name are indeed accessing the same variable taking name  
resolution (shadowing) into account))

One example for this is that before throwing out the semantic  
analysis data, I change the class of RBVariableNodes
to RBInstVariableNode, RBTempVariableNode, .... so I can later know  
what kind of variable I deal with.

> I was wondering if it would not make sense to have multiple  
> representations of the tree.

In the end, this is what properties now provide... but properties are  
not that nice, as they
make a lot of things implicit that should be explicit, e.g. there is  
no way to know which properties
exist, and which user of the tree adds which properties.

It's very easy to completely mess up the design of a software with  
properties, Morphic is a good
example, especially the BookMorph abuses properties to an extend that  
nobody understands the
code. One of the problems of properties is that they are added by  
evaluating code at runtime, like
dynamically scoped variables there is no way to statically know if a  
variable if declared or not.

So I think we should experiment a bit... having properties is the  
right thing for now, but I think
I will not change the AST representation too much, but instead do an  
additional pass over it
that transformes is into the representation I need (e.g. no Tokens).

> Then I was wondering if this is not a clear indication that we  
> would need
> hashBasedSubclass: in addition to the one we have
>

indeed. this would solve one problem: iVars taking space. Another  
thing they don't solve
is that this way, all possible properties of all clients (e.g.  
compiler phases) would need
to be declared in the AST classes. Here I like  Wide Classes... where  
a Compiler Phase could
declare "I need to extend the node object with this state".

And in addition, properties remain useful for annotating data  
structures: Annotations should be
possible even if the framework author does not anticipate the need  
for the annotation, and they
have an extend that is possibly unlimited in time, so wide classes  
are not a solution for attaching
meta data.

	Marcus

> Stef
>
>> Hello,
>>
>> Yesterday I merged some work done earlier into the current version  
>> of the AST package.
>>
>> The problem with the RB AST ist size: Just using the AST and  
>> NewCompiler as is, the AST
>> is quite huge. An image that keeps the complete AST of all methods  
>> is ca. 800MB in size.
>>
>> The reason for that is that far too much information is retained:  
>> the AST holds on the original text,
>> all the scanner tokens, the Intermediate representation,  
>> information from semantic analysis... and
>> all this information is saved in instance variables of the Node  
>> objects, which in turn are nil most
>> of the time. (once you have 1.5 Million objects of something, an   
>> instance variable that is nil
>> does indeed cost memory ;-))
>>
>> Over the next weeks/months I will slowly work on making the  
>> representation far more compact.
>>
>> As a first step, there is now a property interface on  
>> RBProgramNode. This allows to reduce the amount
>> of instance variables: The idea is to have the AST encode just  
>> what realy is the AST directly (including
>> names of variables), keeping all the transient data (e.g. semantic  
>> analysis data of the compiler) or
>> meta-data (formatting, comments) as properties.
>>
>> The next step will be to get rid of the Scanner Tokens.
>>
>> 	Marcus
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3947 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/newcompiler/attachments/20070409/5cb103a5/smime.bin