[squeak-dev] Changing an object's class-balanced tre.

Levente Uzonyi leves at elte.hu
Sun May 1 23:13:24 UTC 2011


On Sun, 1 May 2011, Frank Shearar wrote:

> On 2011/05/01 18:57, Levente Uzonyi wrote:
>> On Sun, 1 May 2011, Frank Shearar wrote:
>> 
>>> Ah, that might be exactly the hammer I'm looking for. Thanks!
>>> 
>>> I don't want to add the balancing protocol to ZTree because then I'm
>>> allowing the mutation of a ZTree, which means that clients of ZTree
>>> can't assume that it's immutable.
>>> 
>>> Well. To be picky, ZTree _does_ have the balancing protocol, in the
>>> sense that it responds to #balance. The implementation of #balance,
>>> however, makes a MutableZTree copy, balances that, and returns an
>>> immutable version of the newly balanced tree. ZTree just doesn't
>>> permit the necessary mutating like setting a subtree and the like.
>>> ("Necessary" because the DSW algorithm makes extensive use of mutation.)
>> 
>> You shouldn't be that strict IMHO. You can mutate the state of the nodes
>> as long as it has no side effects. So if you copy all nodes, then you
>> can freely mutate the state of them before giving any of them to the user.
>
> Well, in the context of a zipper zipping over a tree you do need to be that 
> strict, surely? The copying that I'm doing here is "special case" - when 
> simply zipping over a tree it's easy enough to not make copies of things. 
> This particular experiment's a bit of a "big bang" mutation.

I'm not familiar with zippers, but the only case when you can save copying 
is when the subtree is already balanced and it fits the new balanced data 
structure without any modifications.

>
> Certainly creating a whole new tree, mutating it and letting it "set"/turn 
> immutable's a whole lot cheaper than folding the tree into a list and then 
> unfolding it recursively. (*)

I guess with DSW you create two copies from each node, while with DSW2 you 
only create one and change the class of it. You may find other primitive 
based methods, like #copyFrom: useful if you're not using it already. 
Something like

ZTree >> #mutableCopy

 	^MutableZTree basicNew copyFrom: self

may be faster than directly changing the instance variables.

Btw, if timeToAdd1000 means what I think it does, then you have plenty of 
room for improvement. Adding 100000 integers to 
LLRBConcurrentDictionaryTree (which is a persistent, concurrent 
(lock-free), binary tree based dictionary) takes 250-350ms (depending 
on the order of them) using Cog.


Levente

>
>> If you want to create efficient immutable data structures in smalltalk
>> then you should use different techniques than with modern functional
>> programming languages. For example smalltalk VMs won't detect that you
>> created a copy of something that won't be used anymore, therefore you
>> should avoid creating such copies.
>
> Indeed. And not mutating lets me avoid copying! But I'll freely admit that 
> I'm new to all this, and I'm more than happy to hear what things work or 
> don't work.
>
>> If you don't want to expose state mutating methods to the user, then you
>> can use separate tree and node objects, like I did with LLRBT, which is
>> a persistent balanced binary tree implementation. An old version is
>> available at http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz . It has
>> LLRBTree which is the class for the tree object and there are two
>> classes for nodes, LLRBTreeNode and LLRBExtremalTreeNode.
>> Users only have direct access to tree objects in this case.
>
> I'll certainly take a look at LLRBT; thanks!
>
> frank
>
> (*) Here's the data. The times are in milliseconds. "recursive" flattens the 
> tree into a list and then rebuilds the tree recursively, DSW mutates a copy, 
> and DSW mutates a copy with the adoptInstance: optimisation.
>
> #numNodes, timeToAdd1000, recursive, DSW, DSW2
> 1000, 1012, 84, 53, 37
> 2000, 1853, 155, 108, 66
> 3000, 2242, 253, 142, 108
> 4000, 2985, 401, 217, 136
> 5000, 3612, 495, 417, 234
> 6000, 4200, 728, 323, 220
> 7000, 4900, 760, 337, 357
> 8000, 5596, 893, 390, 301
> 9000, 6530, 1115, 656, 352
> 10000, 7038, 1298, 593, 344
> 11000, 7867, 1538, 594, 583
> 12000, 8707, 1666, 632, 414
> 13000, 9347, 2169, 704, 607
> 14000, 10251, 2031, 706, 651
> 15000, 11134, 2459, 968, 554
> 16000, 11636, 2826, 794, 623
> 17000, 12410, 3077, 970, 717
> 18000, 12876, 3497, 1138, 690
> 19000, 13740, 3865, 1222, 865
> 20000, 14078, 4168, 1207, 934
>
>
> frank
>
>



More information about the Squeak-dev mailing list