HashBits, a lazy way

Daniel Vainsencher danielv at netvision.net.il
Fri Jul 11 07:59:45 UTC 2003


It might be simpler would be to always compute real hashes for
everything before saving the image on a save. Consequences -
1. Temporary objects don't get them - the common case performance is
improved.
2. No VM/Image compatibilites - the image save format is preserved. 
3. Image saving is slowed down somewhat.

Doesn't solve how to mark "unhashed" in memory. Using 0 hash value to
mark it is cheating. How do we ensure that it doesn't bite us? it needs
to not affect hash function implementations, which seems non-trivial. Do
we happen to have an object header bit free? (or free at the beginning
of the objects life?). I like Lex's suggestion.

Daniel

John M McIntosh <johnmci at mac.com> wrote:
> >
> > A question aside: Are 4096 possible hash values enough? (I remember
> > vaguely that this has been discussed on the list but I do not remember
> > the conclusion reached if any.)
> >
> 
> Well that depends on if you want to store less than 4096 objects in a  
> set.
> The image has 45 implementers of hash like for example
> Point>>hash
> 	^(x hash hashMultiply + y hash) hashMultiply
> 
> which as you see is complicated and attempts to distribute lots of  
> instances of Point into
> a set. Collections are another issue, some implementations have large  
> Collection hashing
> algorithms to ensure when you exceed 100,000's of objects you won't  
> have a problem.
> So hashing is an important tuning issue once you start to talk about  
> thousands of objects in a Set.
> 
> 
> > I do not understand why (1.0+2.0)*3.0 generates hashs on all the
> > objects. What does this expression mean in this context?
> 
> Tim already answered this, but I'll point out the 1.0/2.0/3.0 become  
> floating point objects
>   so that is 3 hashs, the 1.0+2.0 creates the 3.0  object which is  
> another hash, the 3.0*3.0 creates
> the answer object 9.0,  yet another hash. So at least 5 hashs here just  
> to do 1 floating point add, and a
> multiple, this makes floating point math slow.  And certainly not  
> needed.
> 
> >
> > Isn't this a one time action within each image?
> > How is it tied to a VM change?
> 
> If I set the hash value to zero, then later I use it, and it's zero it  
> means I should generate the hash.
> 
> Issues.
>   hashlessOften.image with hashlessOftenVM.exe builds lots of zero hash  
> objects, then save
>   hashlessOften.image with oldVM.exe, will use identityhash 0 as hash,  
> but could cause performance problem.
> 
> or
> hashlessOften.image with oldVM.exe, save, then use hashlessOften.VM,  
> but now identityhash 0 objects in Set, must rehash, because  
> identityhash 0 implies generating a valid non-zero identityhash.
> d) oldImage.image with hashlessOfthen.VM.exe, but hash 0 might be in  
> Set.
> >
> > > So I'm looking for a brilliant idea of course to automate that.
> >
> > To automate what? Rehasing all the sets?
> 
> See issues above.
> 
> In some respected it might require changing the image version number...
> Unless someone can think of a clever answer...
> 
> Actually I can tag an image (see writeImageFileIO: imageBytes) to say  
> that this was saved using a
> lazyhash VM, by setting one of the 7 extra fields in the image header  
> to 1, versus zero. Then would
> then allow me to consider that I need to rehash all the sets.   
> Technically
> I could ask if this image has been damaged by saving with a non-lazy  
> hashed VM which might
> insert a identityhash of zero.
> 
> Note sure what to do about the case of using an older Vm to fiddle with  
> an image which will contain
> a bunch of identityhash of zero on purpose. Mmm maybe if I look at  
> vmVersion I could post a dialog
> warning you this combination is not a good thing...
> 
> Mmm these seem workable... Now for some testing to show this would be a  
> good thing.
> --
> ======================================================================== 
> ===
> John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ======================================================================== 
> ===



More information about the Squeak-dev mailing list