HashBits, a lazy way
Daniel Vainsencher
danielv at netvision.net.il
Fri Jul 11 07:59:45 UTC 2003
It might be simpler would be to always compute real hashes for
everything before saving the image on a save. Consequences -
1. Temporary objects don't get them - the common case performance is
improved.
2. No VM/Image compatibilites - the image save format is preserved.
3. Image saving is slowed down somewhat.
Doesn't solve how to mark "unhashed" in memory. Using 0 hash value to
mark it is cheating. How do we ensure that it doesn't bite us? it needs
to not affect hash function implementations, which seems non-trivial. Do
we happen to have an object header bit free? (or free at the beginning
of the objects life?). I like Lex's suggestion.
Daniel
John M McIntosh <johnmci at mac.com> wrote:
> >
> > A question aside: Are 4096 possible hash values enough? (I remember
> > vaguely that this has been discussed on the list but I do not remember
> > the conclusion reached if any.)
> >
>
> Well that depends on if you want to store less than 4096 objects in a
> set.
> The image has 45 implementers of hash like for example
> Point>>hash
> ^(x hash hashMultiply + y hash) hashMultiply
>
> which as you see is complicated and attempts to distribute lots of
> instances of Point into
> a set. Collections are another issue, some implementations have large
> Collection hashing
> algorithms to ensure when you exceed 100,000's of objects you won't
> have a problem.
> So hashing is an important tuning issue once you start to talk about
> thousands of objects in a Set.
>
>
> > I do not understand why (1.0+2.0)*3.0 generates hashs on all the
> > objects. What does this expression mean in this context?
>
> Tim already answered this, but I'll point out the 1.0/2.0/3.0 become
> floating point objects
> so that is 3 hashs, the 1.0+2.0 creates the 3.0 object which is
> another hash, the 3.0*3.0 creates
> the answer object 9.0, yet another hash. So at least 5 hashs here just
> to do 1 floating point add, and a
> multiple, this makes floating point math slow. And certainly not
> needed.
>
> >
> > Isn't this a one time action within each image?
> > How is it tied to a VM change?
>
> If I set the hash value to zero, then later I use it, and it's zero it
> means I should generate the hash.
>
> Issues.
> hashlessOften.image with hashlessOftenVM.exe builds lots of zero hash
> objects, then save
> hashlessOften.image with oldVM.exe, will use identityhash 0 as hash,
> but could cause performance problem.
>
> or
> hashlessOften.image with oldVM.exe, save, then use hashlessOften.VM,
> but now identityhash 0 objects in Set, must rehash, because
> identityhash 0 implies generating a valid non-zero identityhash.
> d) oldImage.image with hashlessOfthen.VM.exe, but hash 0 might be in
> Set.
> >
> > > So I'm looking for a brilliant idea of course to automate that.
> >
> > To automate what? Rehasing all the sets?
>
> See issues above.
>
> In some respected it might require changing the image version number...
> Unless someone can think of a clever answer...
>
> Actually I can tag an image (see writeImageFileIO: imageBytes) to say
> that this was saved using a
> lazyhash VM, by setting one of the 7 extra fields in the image header
> to 1, versus zero. Then would
> then allow me to consider that I need to rehash all the sets.
> Technically
> I could ask if this image has been damaged by saving with a non-lazy
> hashed VM which might
> insert a identityhash of zero.
>
> Note sure what to do about the case of using an older Vm to fiddle with
> an image which will contain
> a bunch of identityhash of zero on purpose. Mmm maybe if I look at
> vmVersion I could post a dialog
> warning you this combination is not a good thing...
>
> Mmm these seem workable... Now for some testing to show this would be a
> good thing.
> --
> ========================================================================
> ===
> John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
More information about the Squeak-dev
mailing list
|