A brief, ignorable, experience report..
One idea for speeding up lookups is to have a Selector class be a subclass of Symbol and when a Symbol becomes a Selector to have hash value fields in the Selector. When a MethodDictionary is rehashed, there is a small set of hashes which are tried and the most perfect (least collisions) is used for that particular MethodDictionary.
Because of Spur, we can have several direct subclasses of MathodDictionary differing only in which hash value is accessed. Just change the ClassID and the hash access method selects the proper hash value of symbols. As long as the hash accessor stays polymorphic, the PIC mechanics will work for us.
So the question becomes "Are there string/symbol hash functions where this approach makes sense?". I.e. is there enough difference in hash functions that we can get close(r) to perfect hashes and have better results choosing one over another?
As I wait for Lulu to deliver Andres' book, I found some string hash functions [note: https://azrael.digipen.edu/~mmead/www/Courses/CS280/HashFunctions-1.html ] and took an a first look at this, counting collisions for (hash \ dictSize) + 1 or more efficiently [note https://craftinginterpreters.com/optimization.html] (hash bitAnd: (dictSize - 1)) + 1
Early results:
vvv=========vvv "Count has collisions for all selectors in each of 1527 methodDicts" "For each Class methodDict, which hash has the fewest collisions?" HashCandidates globalCollisionsCounts. "Lower is Better"
#hash collisions = 5466 #identityHash collisions = 6534 #hash0 collisions = 5331 #hash1 collisions = 5301 #hash2 collisions = 5419 #hash3 collisions = 5442 #hash4 collisions = 5436 #hash5 collisions = 14630 #hash6 collisions = 5442 #hash7 collisions = 5436 #hash8 collisions = 18708 #hash9 collisions = 25344 #hash10 collisions = 23518
#(5466 6534 5331 5301 5419 5442 5436 14630 5442 5436 18708 25344 23518)
"Compare hashs for each MethodDict; lowest collisions wins" HashCandidates globalHashWinCounts. "Higher is Better"
#hash win count = 664 #identityHash win count = 610 #hash0 win count = 672 #hash1 win count = 680 #hash2 win count = 662 #hash3 win count = 679 #hash4 win count = 666 #hash5 win count = 429 #hash6 win count = 679 #hash7 win count = 666 #hash8 win count = 350 #hash9 win count = 247 #hash10 win count = 263
#(664 610 672 680 662 679 666 429 679 666 350 247 263) ^^^=========^^^
One quick observation is that string hash is better than identityHash and hash1 (K&R) is better than current string hash. Also, the hashes used so far do not show much advantage against each other in the better cases.
---
An unrelated idea is to "copy down" just megamorphic methods to subclasses to speed up megamorphic lookups.
Neither of these ideas requires JIT/recompile to implement, just IDE changes.
Just playing around.. -KenD
That's all in all a very interesting in-image optimization take :D -t
On 20. Aug 2021, at 20:26, ken.dickey@whidbey.com wrote:
A brief, ignorable, experience report..
One idea for speeding up lookups is to have a Selector class be a subclass of Symbol and when a Symbol becomes a Selector to have hash value fields in the Selector. When a MethodDictionary is rehashed, there is a small set of hashes which are tried and the most perfect (least collisions) is used for that particular MethodDictionary.
Because of Spur, we can have several direct subclasses of MathodDictionary differing only in which hash value is accessed. Just change the ClassID and the hash access method selects the proper hash value of symbols. As long as the hash accessor stays polymorphic, the PIC mechanics will work for us.
So the question becomes "Are there string/symbol hash functions where this approach makes sense?". I.e. is there enough difference in hash functions that we can get close(r) to perfect hashes and have better results choosing one over another?
As I wait for Lulu to deliver Andres' book, I found some string hash functions [note: https://azrael.digipen.edu/~mmead/www/Courses/CS280/HashFunctions-1.html ] and took an a first look at this, counting collisions for (hash \ dictSize) + 1 or more efficiently [note https://craftinginterpreters.com/optimization.html] (hash bitAnd: (dictSize - 1)) + 1
Early results:
vvv=========vvv "Count has collisions for all selectors in each of 1527 methodDicts" "For each Class methodDict, which hash has the fewest collisions?" HashCandidates globalCollisionsCounts. "Lower is Better"
#hash collisions = 5466 #identityHash collisions = 6534 #hash0 collisions = 5331 #hash1 collisions = 5301 #hash2 collisions = 5419 #hash3 collisions = 5442 #hash4 collisions = 5436 #hash5 collisions = 14630 #hash6 collisions = 5442 #hash7 collisions = 5436 #hash8 collisions = 18708 #hash9 collisions = 25344 #hash10 collisions = 23518
#(5466 6534 5331 5301 5419 5442 5436 14630 5442 5436 18708 25344 23518)
"Compare hashs for each MethodDict; lowest collisions wins" HashCandidates globalHashWinCounts. "Higher is Better"
#hash win count = 664 #identityHash win count = 610 #hash0 win count = 672 #hash1 win count = 680 #hash2 win count = 662 #hash3 win count = 679 #hash4 win count = 666 #hash5 win count = 429 #hash6 win count = 679 #hash7 win count = 666 #hash8 win count = 350 #hash9 win count = 247 #hash10 win count = 263
#(664 610 672 680 662 679 666 429 679 666 350 247 263) ^^^=========^^^
One quick observation is that string hash is better than identityHash and hash1 (K&R) is better than current string hash. Also, the hashes used so far do not show much advantage against each other in the better cases.
An unrelated idea is to "copy down" just megamorphic methods to subclasses to speed up megamorphic lookups.
Neither of these ideas requires JIT/recompile to implement, just IDE changes.
Just playing around.. -KenD
<String-*stringhashtests.st>
vm-dev@lists.squeakfoundation.org