On Sun, 1 May 2011, Frank Shearar wrote:
On 2011/05/01 18:57, Levente Uzonyi wrote:
On Sun, 1 May 2011, Frank Shearar wrote:
Ah, that might be exactly the hammer I'm looking for. Thanks!
I don't want to add the balancing protocol to ZTree because then I'm allowing the mutation of a ZTree, which means that clients of ZTree can't assume that it's immutable.
Well. To be picky, ZTree _does_ have the balancing protocol, in the sense that it responds to #balance. The implementation of #balance, however, makes a MutableZTree copy, balances that, and returns an immutable version of the newly balanced tree. ZTree just doesn't permit the necessary mutating like setting a subtree and the like. ("Necessary" because the DSW algorithm makes extensive use of mutation.)
You shouldn't be that strict IMHO. You can mutate the state of the nodes as long as it has no side effects. So if you copy all nodes, then you can freely mutate the state of them before giving any of them to the user.
Well, in the context of a zipper zipping over a tree you do need to be that strict, surely? The copying that I'm doing here is "special case" - when simply zipping over a tree it's easy enough to not make copies of things. This particular experiment's a bit of a "big bang" mutation.
I'm not familiar with zippers, but the only case when you can save copying is when the subtree is already balanced and it fits the new balanced data structure without any modifications.
Certainly creating a whole new tree, mutating it and letting it "set"/turn immutable's a whole lot cheaper than folding the tree into a list and then unfolding it recursively. (*)
I guess with DSW you create two copies from each node, while with DSW2 you only create one and change the class of it. You may find other primitive based methods, like #copyFrom: useful if you're not using it already. Something like
ZTree >> #mutableCopy
^MutableZTree basicNew copyFrom: self
may be faster than directly changing the instance variables.
Btw, if timeToAdd1000 means what I think it does, then you have plenty of room for improvement. Adding 100000 integers to LLRBConcurrentDictionaryTree (which is a persistent, concurrent (lock-free), binary tree based dictionary) takes 250-350ms (depending on the order of them) using Cog.
Levente
If you want to create efficient immutable data structures in smalltalk then you should use different techniques than with modern functional programming languages. For example smalltalk VMs won't detect that you created a copy of something that won't be used anymore, therefore you should avoid creating such copies.
Indeed. And not mutating lets me avoid copying! But I'll freely admit that I'm new to all this, and I'm more than happy to hear what things work or don't work.
If you don't want to expose state mutating methods to the user, then you can use separate tree and node objects, like I did with LLRBT, which is a persistent balanced binary tree implementation. An old version is available at http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz . It has LLRBTree which is the class for the tree object and there are two classes for nodes, LLRBTreeNode and LLRBExtremalTreeNode. Users only have direct access to tree objects in this case.
I'll certainly take a look at LLRBT; thanks!
frank
(*) Here's the data. The times are in milliseconds. "recursive" flattens the tree into a list and then rebuilds the tree recursively, DSW mutates a copy, and DSW mutates a copy with the adoptInstance: optimisation.
#numNodes, timeToAdd1000, recursive, DSW, DSW2 1000, 1012, 84, 53, 37 2000, 1853, 155, 108, 66 3000, 2242, 253, 142, 108 4000, 2985, 401, 217, 136 5000, 3612, 495, 417, 234 6000, 4200, 728, 323, 220 7000, 4900, 760, 337, 357 8000, 5596, 893, 390, 301 9000, 6530, 1115, 656, 352 10000, 7038, 1298, 593, 344 11000, 7867, 1538, 594, 583 12000, 8707, 1666, 632, 414 13000, 9347, 2169, 704, 607 14000, 10251, 2031, 706, 651 15000, 11134, 2459, 968, 554 16000, 11636, 2826, 794, 623 17000, 12410, 3077, 970, 717 18000, 12876, 3497, 1138, 690 19000, 13740, 3865, 1222, 865 20000, 14078, 4168, 1207, 934
frank