[Vm-dev] Making a Hash of things, er, strings

Tobias Pape Das.Linux at gmx.de
Sat Aug 21 10:53:59 UTC 2021


That's all in all a very interesting in-image optimization take :D
-t

> On 20. Aug 2021, at 20:26, ken.dickey at whidbey.com wrote:
> 
> A brief, ignorable, experience report..
> 
> One idea for speeding up lookups is to have a Selector class be a subclass
> of Symbol and when a Symbol becomes a Selector to have hash value fields
> in the Selector.  When a MethodDictionary is rehashed, there is a small
> set of hashes which are tried and the most perfect (least collisions)
> is used for that particular MethodDictionary.
> 
> Because of Spur, we can have several direct subclasses of MathodDictionary
> differing only in which hash value is accessed.  Just change the ClassID
> and the hash access method selects the proper hash value of symbols.
> As long as the hash accessor stays polymorphic, the PIC mechanics will
> work for us.
> 
> So the question becomes "Are there string/symbol hash functions where
> this approach makes sense?".  I.e. is there enough difference in hash
> functions that we can get close(r) to perfect hashes and have better
> results choosing one over another?
> 
> As I wait for Lulu to deliver Andres' book, I found some string
> hash functions
> [note: https://azrael.digipen.edu/~mmead/www/Courses/CS280/HashFunctions-1.html ]
> and took an a first look at this, counting collisions
> for
> 	(hash \\ dictSize) + 1
> or more efficiently [note https://craftinginterpreters.com/optimization.html]
> 	(hash bitAnd: (dictSize - 1)) + 1
> 
> Early results:
> 
> vvv=========vvv
> "Count has collisions for all selectors in each of 1527 methodDicts"
> "For each Class methodDict, which hash has the fewest collisions?"
> HashCandidates globalCollisionsCounts.  "Lower is Better"
> 
> #hash collisions = 5466
> #identityHash collisions = 6534
> #hash0 collisions = 5331
> #hash1 collisions = 5301
> #hash2 collisions = 5419
> #hash3 collisions = 5442
> #hash4 collisions = 5436
> #hash5 collisions = 14630
> #hash6 collisions = 5442
> #hash7 collisions = 5436
> #hash8 collisions = 18708
> #hash9 collisions = 25344
> #hash10 collisions = 23518
> 
> #(5466 6534 5331 5301 5419 5442 5436 14630 5442 5436 18708 25344 23518)
> 
> "Compare hashs for each MethodDict; lowest collisions wins"
> HashCandidates globalHashWinCounts. "Higher is Better"
> 
> #hash win count = 664
> #identityHash win count = 610
> #hash0 win count = 672
> #hash1 win count = 680
> #hash2 win count = 662
> #hash3 win count = 679
> #hash4 win count = 666
> #hash5 win count = 429
> #hash6 win count = 679
> #hash7 win count = 666
> #hash8 win count = 350
> #hash9 win count = 247
> #hash10 win count = 263
> 
> #(664 610 672 680 662 679 666 429 679 666 350 247 263)
> ^^^=========^^^
> 
> One quick observation is that string hash is better than identityHash and
> hash1 (K&R) is better than current string hash.  Also, the hashes
> used so far do not show much advantage against each other in the
> better cases.
> 
> ---
> 
> An unrelated idea is to "copy down" just megamorphic methods to subclasses to speed
> up megamorphic lookups.
> 
> Neither of these ideas requires JIT/recompile to implement, just IDE changes.
> 
> Just playing around..
> -KenD
> 
> <String-*stringhashtests.st>




More information about the Vm-dev mailing list