[Newcompiler] Properties for AST

Stéphane Ducasse stephane.ducasse at univ-savoie.fr
Mon Apr 9 13:42:58 UTC 2007


On 9 avr. 07, at 10:23, Marcus Denker wrote:

>
> On 09.04.2007, at 09:00, Stéphane Ducasse wrote:
>
>> Hi
>>
>> I have a question is the information retained useful for other tools.
>>
>
> In case of the Tokens, I think we don't need them later. The whole  
> idea of providing a high-level
> representation is to be able to throw away the low level one. What  
> we need from the Tokens
> is meta data: comments and possitions in source. The position data  
> we then can flush if it is identical
> to the one the pretty printer would re-create. This way, we don't  
> need any position/whitespace data
> on the AST of  any methods of the system other then newly created  
> ones by people not using the
> pretty printer.

Ok but for example the debugger does not use such information to  
highlight the messages?
>
> On the other hand, information from semantic analysis is indeed  
> useful later (e.g. which variables with
> the same name are indeed accessing the same variable taking name  
> resolution (shadowing) into account))


>
> One example for this is that before throwing out the semantic  
> analysis data, I change the class of RBVariableNodes
> to RBInstVariableNode, RBTempVariableNode, .... so I can later know  
> what kind of variable I deal with.

yes and the visitor can be much better using these class

>> I was wondering if it would not make sense to have multiple  
>> representations of the tree.
>
> In the end, this is what properties now provide... but properties  
> are not that nice, as they
> make a lot of things implicit that should be explicit, e.g. there  
> is no way to know which properties
> exist, and which user of the tree adds which properties.

Exact.
>
> It's very easy to completely mess up the design of a software with  
> properties, Morphic is a good
> example, especially the BookMorph abuses properties to an extend  
> that nobody understands the
> code. One of the problems of properties is that they are added by  
> evaluating code at runtime, like
> dynamically scoped variables there is no way to statically know if  
> a variable if declared or not.

Exact this is why I asked :)

> So I think we should experiment a bit... having properties is the  
> right thing for now, but I think
> I will not change the AST representation too much, but instead do  
> an additional pass over it
> that transformes is into the representation I need (e.g. no Tokens).
>
>> Then I was wondering if this is not a clear indication that we  
>> would need
>> hashBasedSubclass: in addition to the one we have
>>
>
> indeed. this would solve one problem: iVars taking space. Another  
> thing they don't solve
> is that this way, all possible properties of all clients (e.g.  
> compiler phases) would need
> to be declared in the AST classes. Here I like  Wide Classes...  
> where a Compiler Phase could
> declare "I need to extend the node object with this state".
>
> And in addition, properties remain useful for annotating data  
> structures: Annotations should be
> possible even if the framework author does not anticipate the need  
> for the annotation, and they
> have an extend that is possibly unlimited in time, so wide classes  
> are not a solution for attaching
> meta data.

True.
I hate the term wide class when this is the objects that are changing  
class :)

> 	Marcus
> 	
>
>> Stef
>>
>>> Hello,
>>>
>>> Yesterday I merged some work done earlier into the current  
>>> version of the AST package.
>>>
>>> The problem with the RB AST ist size: Just using the AST and  
>>> NewCompiler as is, the AST
>>> is quite huge. An image that keeps the complete AST of all  
>>> methods is ca. 800MB in size.
>>>
>>> The reason for that is that far too much information is retained:  
>>> the AST holds on the original text,
>>> all the scanner tokens, the Intermediate representation,  
>>> information from semantic analysis... and
>>> all this information is saved in instance variables of the Node  
>>> objects, which in turn are nil most
>>> of the time. (once you have 1.5 Million objects of something, an   
>>> instance variable that is nil
>>> does indeed cost memory ;-))
>>>
>>> Over the next weeks/months I will slowly work on making the  
>>> representation far more compact.
>>>
>>> As a first step, there is now a property interface on  
>>> RBProgramNode. This allows to reduce the amount
>>> of instance variables: The idea is to have the AST encode just  
>>> what realy is the AST directly (including
>>> names of variables), keeping all the transient data (e.g.  
>>> semantic analysis data of the compiler) or
>>> meta-data (formatting, comments) as properties.
>>>
>>> The next step will be to get rid of the Scanner Tokens.
>>>
>>> 	Marcus
>>
>



More information about the Newcompiler mailing list