<div dir="ltr">Hi Andres,<div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 18, 2017 at 11:06 AM, Andres Valloud <span dir="ltr"><<a href="mailto:avalloud@smalltalk.comcastbiz.net" target="_blank">avalloud@smalltalk.comcastbiz.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>
The hashing technique is just a multiplicative hash function. The multiplication technique is equivalent to splitting AX into AH and AL, and using the distributive rule to (AH * 2^8 + AL) * (BH * 2^8 + BL). The choice of 1664525 is due to Mark van der Gulik, he took the constant out of a Knuth table of good linear congruency pseudo-RNG factors. It's been 17 years since that, I do not currently think the pseudo-RNG relationship is that relevant. In the mean time, as hashing goes, it's good enough.</blockquote><div><br></div><div>I'm looking for more of a reference, say to a paper which describes the technique, or a reference implementation on the web. Remember that one important proper of 1664525 is that it belongs to a set of integers for which n bitCount * 2 between: n highBit and: n highBit + 2, i.e. that about half the bits are set. Further, the bits are reasonably well distributed throughout, unlike for example 16rF0F0F0F0F.</div><div><br></div><div>I'll look up the table in my Knuth vol 3.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-"><br>
<br>
On 4/18/17 10:01 , Eliot Miranda wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
Hi Andres,<br>
<br>
On Tue, Apr 18, 2017 at 1:52 AM, Andres Valloud<br>
<<a href="mailto:avalloud@smalltalk.comcastbiz.net" target="_blank">avalloud@smalltalk.comcastbiz<wbr>.net</a><br></span><div><div class="gmail-h5">
<mailto:<a href="mailto:avalloud@smalltalk.comcastbiz.net" target="_blank">avalloud@smalltalk.com<wbr>castbiz.net</a>>> wrote:<br>
<br>
<br>
Yes, the SqR initials are mine. That's too bad about the comments,<br>
at the very least there should be something like this:<br>
<br>
"This code performs a multiplication by 1664525 mod 2^28 without<br>
overflowing into large integers."<br>
<br>
<br>
Thanks. I'll update the primitives appropriately. Could you supply a<br>
reference both for the technique and the choice of 1664525 in<br>
particular? I'd like to include that info in the comments too.<br>
<br>
<br>
On 4/18/17 1:48 , Levente Uzonyi wrote:<br>
<br>
The methods in SmallInteger and LargePositiveInteger have the<br>
initials<br>
SqR (that's yours IIRC), but they have no comment.<br>
The variant in String is originally from 2001 and has never had that<br>
comment.<br>
<br>
Levente<br>
<br>
On Mon, 17 Apr 2017, Andres Valloud wrote:<br>
<br>
<br>
On 4/17/17 16:45 , Eliot Miranda wrote:<br>
<br>
FYI, hash multiply is<br>
hashMultiply<br>
| low |<br>
low := self bitAnd: 16383.<br>
^(16r260D * low + ((16r260D * (self bitShift: -14) +<br>
(16r0065 * low)<br>
bitAnd: 16383) * 16384))<br>
bitAnd: 16r0FFFFFFF<br>
<br>
and when written as a primitive is<br>
hashMultiply<br>
| low |<br>
low := self bitAnd: 16383.<br>
^(16r260D * low + (16r260D * (self bitShift: -14) +<br>
(16r0065 * low) *<br>
16384))<br>
bitAnd: 16r0FFFFFFF<br>
because we don't care about the multiply by 16384<br>
overflowing; when<br>
written as a primitive it won't overflow into a<br>
LargePositiveInteger.<br>
<br>
<br>
Hopefully this doesn't mean the primitive got implemented by<br>
actually<br>
doing these operations verbatim. As you have probably seen,<br>
that<br>
convoluted arithmetic is done this way in Smalltalk only to<br>
simulate a<br>
32x32 multiplication bit-anded down to 28 bits without<br>
overflowing<br>
into large integers (the original code from August 2000 had<br>
my initials).<br>
<br>
At a sufficiently low level such as C, all that complexity<br>
is just an<br>
unsigned multiplication by 1664525. The image code should<br>
still have<br>
a comment to that effect, did it get lost?<br>
<br>
Andres.<br>
<br>
<br>
<br>
<br>
<br>
--<br>
_,,,^..^,,,_<br>
best, Eliot<br>
</div></div></blockquote>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>
</div></div>