Hi,
I've been playing around with immutable data structures (at least, as immutable as one gets).
I came across an interesting algorithm for rebalancing a binary tree, invented by Day, Stout and Warren. It's an in-place pointer-swizzler that, with constant additional space, rebalances a tree in linear time. Of course, it operates on a mutable structure.
So I had an idea: I could "ringfence" the mutation: from my immutable structure, create a mutable copy, tear it to bits and then recreate a new, immutable, balanced, tree.
So I wrote, in addition to my ZTree node class, a MutableZTree subclass. It has the same shape as its parent (no new instvars), and a bunch of extra mutating methods.
There are some costs involved in this, of course: I create two entire copies of the tree. One, making the mutable copy, seems unavoidable: you cannot simply un-immutable the tree, because you have no idea what other objects might have a reference to parts of the tree, and they all assume the tree's immutable. The other, however... well, we don't care about references to the mutable structure because there should be no references to it. It's purely an internal, interim structure.
So I thought, well, wouldn't it be nice if one could _change the class_ of the MutableZTrees so they all magically turn into nice immutable ZTrees.
I'm sure there's some trickery one could do, but quite what that trickery would be, I'm not sure. Any ideas? (Or, just as importantly, good reasons why the above is completely insane?) It would be nice to turn the O(3n) space requirement into O(2n)...
frank
"Frank" == Frank Shearar frank.shearar@angband.za.org writes:
Frank> There are some costs involved in this, of course: I create two Frank> entire copies of the tree. One, making the mutable copy, seems Frank> unavoidable: you cannot simply un-immutable the tree, because you Frank> have no idea what other objects might have a reference to parts Frank> of the tree, and they all assume the tree's immutable.
You appear not to be aware of #become:, or have already ruled it out for reasons you haven't made clear. Which is it?
On Sat, 30 Apr 2011, Frank Shearar wrote:
Hi,
I've been playing around with immutable data structures (at least, as immutable as one gets).
I came across an interesting algorithm for rebalancing a binary tree, invented by Day, Stout and Warren. It's an in-place pointer-swizzler that, with constant additional space, rebalances a tree in linear time. Of course, it operates on a mutable structure.
So I had an idea: I could "ringfence" the mutation: from my immutable structure, create a mutable copy, tear it to bits and then recreate a new, immutable, balanced, tree.
So I wrote, in addition to my ZTree node class, a MutableZTree subclass. It has the same shape as its parent (no new instvars), and a bunch of extra mutating methods.
There are some costs involved in this, of course: I create two entire copies of the tree. One, making the mutable copy, seems unavoidable: you cannot simply un-immutable the tree, because you have no idea what other objects might have a reference to parts of the tree, and they all assume the tree's immutable. The other, however... well, we don't care about references to the mutable structure because there should be no references to it. It's purely an internal, interim structure.
So I thought, well, wouldn't it be nice if one could _change the class_ of the MutableZTrees so they all magically turn into nice immutable ZTrees.
I'm sure there's some trickery one could do, but quite what that trickery would be, I'm not sure. Any ideas? (Or, just as importantly, good reasons why the above is completely insane?) It would be nice to turn the O(3n) space requirement into O(2n)...
You're looking for Behavior >> #adoptInstance:, but why would you bother with changing the class instead of adding the balancing protocol to ZTree?
Levente
frank
On 2011/05/01 01:19, Levente Uzonyi wrote:
On Sat, 30 Apr 2011, Frank Shearar wrote:
Hi,
I've been playing around with immutable data structures (at least, as immutable as one gets).
I came across an interesting algorithm for rebalancing a binary tree, invented by Day, Stout and Warren. It's an in-place pointer-swizzler that, with constant additional space, rebalances a tree in linear time. Of course, it operates on a mutable structure.
So I had an idea: I could "ringfence" the mutation: from my immutable structure, create a mutable copy, tear it to bits and then recreate a new, immutable, balanced, tree.
So I wrote, in addition to my ZTree node class, a MutableZTree subclass. It has the same shape as its parent (no new instvars), and a bunch of extra mutating methods.
There are some costs involved in this, of course: I create two entire copies of the tree. One, making the mutable copy, seems unavoidable: you cannot simply un-immutable the tree, because you have no idea what other objects might have a reference to parts of the tree, and they all assume the tree's immutable. The other, however... well, we don't care about references to the mutable structure because there should be no references to it. It's purely an internal, interim structure.
So I thought, well, wouldn't it be nice if one could _change the class_ of the MutableZTrees so they all magically turn into nice immutable ZTrees.
I'm sure there's some trickery one could do, but quite what that trickery would be, I'm not sure. Any ideas? (Or, just as importantly, good reasons why the above is completely insane?) It would be nice to turn the O(3n) space requirement into O(2n)...
You're looking for Behavior >> #adoptInstance:, but why would you bother with changing the class instead of adding the balancing protocol to ZTree?
Ah, that might be exactly the hammer I'm looking for. Thanks!
I don't want to add the balancing protocol to ZTree because then I'm allowing the mutation of a ZTree, which means that clients of ZTree can't assume that it's immutable.
Well. To be picky, ZTree _does_ have the balancing protocol, in the sense that it responds to #balance. The implementation of #balance, however, makes a MutableZTree copy, balances that, and returns an immutable version of the newly balanced tree. ZTree just doesn't permit the necessary mutating like setting a subtree and the like. ("Necessary" because the DSW algorithm makes extensive use of mutation.)
frank
On Sun, 1 May 2011, Frank Shearar wrote:
Ah, that might be exactly the hammer I'm looking for. Thanks!
I don't want to add the balancing protocol to ZTree because then I'm allowing the mutation of a ZTree, which means that clients of ZTree can't assume that it's immutable.
Well. To be picky, ZTree _does_ have the balancing protocol, in the sense that it responds to #balance. The implementation of #balance, however, makes a MutableZTree copy, balances that, and returns an immutable version of the newly balanced tree. ZTree just doesn't permit the necessary mutating like setting a subtree and the like. ("Necessary" because the DSW algorithm makes extensive use of mutation.)
You shouldn't be that strict IMHO. You can mutate the state of the nodes as long as it has no side effects. So if you copy all nodes, then you can freely mutate the state of them before giving any of them to the user.
If you want to create efficient immutable data structures in smalltalk then you should use different techniques than with modern functional programming languages. For example smalltalk VMs won't detect that you created a copy of something that won't be used anymore, therefore you should avoid creating such copies.
If you don't want to expose state mutating methods to the user, then you can use separate tree and node objects, like I did with LLRBT, which is a persistent balanced binary tree implementation. An old version is available at http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz . It has LLRBTree which is the class for the tree object and there are two classes for nodes, LLRBTreeNode and LLRBExtremalTreeNode. Users only have direct access to tree objects in this case.
Levente
frank
On 2011/05/01 18:57, Levente Uzonyi wrote:
On Sun, 1 May 2011, Frank Shearar wrote:
Ah, that might be exactly the hammer I'm looking for. Thanks!
I don't want to add the balancing protocol to ZTree because then I'm allowing the mutation of a ZTree, which means that clients of ZTree can't assume that it's immutable.
Well. To be picky, ZTree _does_ have the balancing protocol, in the sense that it responds to #balance. The implementation of #balance, however, makes a MutableZTree copy, balances that, and returns an immutable version of the newly balanced tree. ZTree just doesn't permit the necessary mutating like setting a subtree and the like. ("Necessary" because the DSW algorithm makes extensive use of mutation.)
You shouldn't be that strict IMHO. You can mutate the state of the nodes as long as it has no side effects. So if you copy all nodes, then you can freely mutate the state of them before giving any of them to the user.
Well, in the context of a zipper zipping over a tree you do need to be that strict, surely? The copying that I'm doing here is "special case" - when simply zipping over a tree it's easy enough to not make copies of things. This particular experiment's a bit of a "big bang" mutation.
Certainly creating a whole new tree, mutating it and letting it "set"/turn immutable's a whole lot cheaper than folding the tree into a list and then unfolding it recursively. (*)
If you want to create efficient immutable data structures in smalltalk then you should use different techniques than with modern functional programming languages. For example smalltalk VMs won't detect that you created a copy of something that won't be used anymore, therefore you should avoid creating such copies.
Indeed. And not mutating lets me avoid copying! But I'll freely admit that I'm new to all this, and I'm more than happy to hear what things work or don't work.
If you don't want to expose state mutating methods to the user, then you can use separate tree and node objects, like I did with LLRBT, which is a persistent balanced binary tree implementation. An old version is available at http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz . It has LLRBTree which is the class for the tree object and there are two classes for nodes, LLRBTreeNode and LLRBExtremalTreeNode. Users only have direct access to tree objects in this case.
I'll certainly take a look at LLRBT; thanks!
frank
(*) Here's the data. The times are in milliseconds. "recursive" flattens the tree into a list and then rebuilds the tree recursively, DSW mutates a copy, and DSW mutates a copy with the adoptInstance: optimisation.
#numNodes, timeToAdd1000, recursive, DSW, DSW2 1000, 1012, 84, 53, 37 2000, 1853, 155, 108, 66 3000, 2242, 253, 142, 108 4000, 2985, 401, 217, 136 5000, 3612, 495, 417, 234 6000, 4200, 728, 323, 220 7000, 4900, 760, 337, 357 8000, 5596, 893, 390, 301 9000, 6530, 1115, 656, 352 10000, 7038, 1298, 593, 344 11000, 7867, 1538, 594, 583 12000, 8707, 1666, 632, 414 13000, 9347, 2169, 704, 607 14000, 10251, 2031, 706, 651 15000, 11134, 2459, 968, 554 16000, 11636, 2826, 794, 623 17000, 12410, 3077, 970, 717 18000, 12876, 3497, 1138, 690 19000, 13740, 3865, 1222, 865 20000, 14078, 4168, 1207, 934
frank
On Sun, 1 May 2011, Frank Shearar wrote:
On 2011/05/01 18:57, Levente Uzonyi wrote:
On Sun, 1 May 2011, Frank Shearar wrote:
Ah, that might be exactly the hammer I'm looking for. Thanks!
I don't want to add the balancing protocol to ZTree because then I'm allowing the mutation of a ZTree, which means that clients of ZTree can't assume that it's immutable.
Well. To be picky, ZTree _does_ have the balancing protocol, in the sense that it responds to #balance. The implementation of #balance, however, makes a MutableZTree copy, balances that, and returns an immutable version of the newly balanced tree. ZTree just doesn't permit the necessary mutating like setting a subtree and the like. ("Necessary" because the DSW algorithm makes extensive use of mutation.)
You shouldn't be that strict IMHO. You can mutate the state of the nodes as long as it has no side effects. So if you copy all nodes, then you can freely mutate the state of them before giving any of them to the user.
Well, in the context of a zipper zipping over a tree you do need to be that strict, surely? The copying that I'm doing here is "special case" - when simply zipping over a tree it's easy enough to not make copies of things. This particular experiment's a bit of a "big bang" mutation.
I'm not familiar with zippers, but the only case when you can save copying is when the subtree is already balanced and it fits the new balanced data structure without any modifications.
Certainly creating a whole new tree, mutating it and letting it "set"/turn immutable's a whole lot cheaper than folding the tree into a list and then unfolding it recursively. (*)
I guess with DSW you create two copies from each node, while with DSW2 you only create one and change the class of it. You may find other primitive based methods, like #copyFrom: useful if you're not using it already. Something like
ZTree >> #mutableCopy
^MutableZTree basicNew copyFrom: self
may be faster than directly changing the instance variables.
Btw, if timeToAdd1000 means what I think it does, then you have plenty of room for improvement. Adding 100000 integers to LLRBConcurrentDictionaryTree (which is a persistent, concurrent (lock-free), binary tree based dictionary) takes 250-350ms (depending on the order of them) using Cog.
Levente
If you want to create efficient immutable data structures in smalltalk then you should use different techniques than with modern functional programming languages. For example smalltalk VMs won't detect that you created a copy of something that won't be used anymore, therefore you should avoid creating such copies.
Indeed. And not mutating lets me avoid copying! But I'll freely admit that I'm new to all this, and I'm more than happy to hear what things work or don't work.
If you don't want to expose state mutating methods to the user, then you can use separate tree and node objects, like I did with LLRBT, which is a persistent balanced binary tree implementation. An old version is available at http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz . It has LLRBTree which is the class for the tree object and there are two classes for nodes, LLRBTreeNode and LLRBExtremalTreeNode. Users only have direct access to tree objects in this case.
I'll certainly take a look at LLRBT; thanks!
frank
(*) Here's the data. The times are in milliseconds. "recursive" flattens the tree into a list and then rebuilds the tree recursively, DSW mutates a copy, and DSW mutates a copy with the adoptInstance: optimisation.
#numNodes, timeToAdd1000, recursive, DSW, DSW2 1000, 1012, 84, 53, 37 2000, 1853, 155, 108, 66 3000, 2242, 253, 142, 108 4000, 2985, 401, 217, 136 5000, 3612, 495, 417, 234 6000, 4200, 728, 323, 220 7000, 4900, 760, 337, 357 8000, 5596, 893, 390, 301 9000, 6530, 1115, 656, 352 10000, 7038, 1298, 593, 344 11000, 7867, 1538, 594, 583 12000, 8707, 1666, 632, 414 13000, 9347, 2169, 704, 607 14000, 10251, 2031, 706, 651 15000, 11134, 2459, 968, 554 16000, 11636, 2826, 794, 623 17000, 12410, 3077, 970, 717 18000, 12876, 3497, 1138, 690 19000, 13740, 3865, 1222, 865 20000, 14078, 4168, 1207, 934
frank
On Sun, 1 May 2011, Chris Muller wrote:
So I thought, well, wouldn't it be nice if one could _change the class_ of the MutableZTrees so they all magically turn into nice immutable ZTrees.
Object>>#primitiveChangeClassTo:
It doesn't work in cases like:
#[65 66 67] primitiveChangeClassTo: ByteString. "==> primitiveFailed"
| b | b := #[65 66 67]. ByteString adoptInstance: b. b "==> 'ABC'"
And it's probably broken too:
Object new primitiveChangeClassTo: ProtoObject. "==> primitiveFailed"
Levente
Ah, I never saw #adoptInstance: before, thanks.
On Sun, May 1, 2011 at 7:04 PM, Levente Uzonyi leves@elte.hu wrote:
On Sun, 1 May 2011, Chris Muller wrote:
So I thought, well, wouldn't it be nice if one could _change the class_ of the MutableZTrees so they all magically turn into nice immutable ZTrees.
Object>>#primitiveChangeClassTo:
It doesn't work in cases like:
#[65 66 67] primitiveChangeClassTo: ByteString. "==> primitiveFailed"
| b | b := #[65 66 67]. ByteString adoptInstance: b. b "==> 'ABC'"
And it's probably broken too:
Object new primitiveChangeClassTo: ProtoObject. "==> primitiveFailed"
Levente
On 5/2/2011 2:04, Levente Uzonyi wrote:
On Sun, 1 May 2011, Chris Muller wrote:
So I thought, well, wouldn't it be nice if one could _change the class_ of the MutableZTrees so they all magically turn into nice immutable ZTrees.
Object>>#primitiveChangeClassTo:
It doesn't work in cases like:
Indeed. It's all in the method comment:
primitiveChangeClassTo: anObject "Primitive. Change the class of the receiver into the class of the argument given that the format of the receiver matches the format of the argument's class. Fail if receiver or argument are SmallIntegers, or the receiver is an instance of a compact class and the argument isn't, or when the argument's class is compact and the receiver isn't, or when the format of the receiver is different from the format of the argument's class, or when the arguments class is fixed and the receiver's size differs from the size that an instance of the argument's class should have. Note: The primitive will fail in most cases that you think might work. This is mostly because of a) the difference between compact and non-compact classes, and b) because of differences in the format. As an example, '(Array new: 3) primitiveChangeClassTo: Morph basicNew' would fail for three of the reasons mentioned above. Array is compact, Morph is not (failure #1). Array is variable and Morph is fixed (different format - failure #2). Morph is a fixed-field-only object and the array is too short (failure #3). The facility is really provided for certain, very specific applications (mostly related to classes changing shape) and not for casual use."
Cheers, - Andreas
#[65 66 67] primitiveChangeClassTo: ByteString. "==> primitiveFailed"
| b | b := #[65 66 67]. ByteString adoptInstance: b. b "==> 'ABC'"
And it's probably broken too:
Object new primitiveChangeClassTo: ProtoObject. "==> primitiveFailed"
Levente
squeak-dev@lists.squeakfoundation.org