<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2017-04-14 8:03 GMT+02:00 Nicolas Cellier <span dir="ltr"><<a href="mailto:nicolas.cellier.aka.nice@gmail.com" target="_blank">nicolas.cellier.aka.nice@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>However, since Symbol are unique, we could have a different hashing policy.<br></div>Like for example pinning Symbol and then using identity hash.<br></div><div>It would probably complexify the handing of memory, but that's the deal.<br></div><div><br></div></div></div></blockquote><div><br></div><div>oops forget it, the identity hash does not rely on object position in memory, so no need pinning.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div></div>Currently we can't because images are relying on #foo = 'foo', so we have to use the String hash.<br></div>But it does not have to be so. I have tried to remove this equivalence long ago (in squeak 3.9 maybe), and it was not such a big deal if I remember...<br></div><div class="gmail_extra"><br><div class="gmail_quote">2017-04-14 6:05 GMT+02:00 Max Leske <span dir="ltr"><<a href="mailto:maxleske@gmail.com" target="_blank">maxleske@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br><div style="word-wrap:break-word"><div style="word-wrap:break-word">That would radically increase the number of collisions from the hash function. I don’t think that is a good idea. *Especially* when we are talking about a hash function that will be used for dictionaries. When you test the performance of a hash function you should also test it’s properties w.r.t. to hashed collections (Andrés Valloud explains that nicely in “Hashing in Smalltalk”).<div><br></div><div>Cheers,</div><div>Max<br><div><br><div><blockquote type="cite"><div>On 13 Apr 2017, at 23:19, Clément Bera <<a href="mailto:bera.clement@gmail.com" target="_blank">bera.clement@gmail.com</a>> wrote:</div><br class="m_6617662628032155347m_7021847612583531968Apple-interchange-newline"><div><div dir="ltr"><span style="font-size:12.8px">Hi all,</span><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">I am trying to investigate performance overhead in benchmarks to improve the VM performance. In a benchmark strings are used as dictionary keys, which seems to be an OK pattern to me that may be actually used in real code from time to time. Something stroke me: <u>the hash function of a string is actually iterating over the whole string. </u><br clear="all"><div><br></div><div>As a result, I get something like this:</div><div><br></div><div>#(</div><div>'String size: 0, Hash function performance: 17,817,776 per second' </div><div>'String size: 1, Hash function performance: 15,395,388 per second' </div><div>'String size: 3, Hash function performance: 14,853,109 per second' </div><div>'String size: 4, Hash function performance: 15,151,954 per second' </div><div>'String size: 5, Hash function performance: 14,139,881 per second' </div><div>'String size: 10, Hash function performance: 11,545,749 per second' </div><div>'String size: 65, Hash function performance: 3,431,067 per second' </div><div>'String size: 130, Hash function performance: 1,879,047 per second' </div><div>'String size: 520, Hash function performance: 511,934 per second'</div><div>)<br></div><span class=""><div><br></div><div>This means that if a hash of a 1MB String is computed it takes a hell of a time. </div><div><br></div></span><div>I wonder why we don't compute the string hash from a small number of inner characters taken randomly but deterministically from the String. From example, for all strings longer than 5 characters, one could pick only 5 characters in the String to compute its hash. The hash function is this way decent enough and the performance does not drop for large strings.</div><div><br></div><div>One implementation could be to replace:</div><div><font face="monospace, monospace">1 to: stringSize do: [:pos |</font></div><div>by:</div><div><font face="monospace, monospace">1 to: stringSize by: stringSize // 5 do: [:pos |</font></div><div>in:</div><div><font face="monospace, monospace">ByteString>>stringHash: aString initialHash: speciesHash</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="arial, helvetica, sans-serif">Then we need to rehash the whole image.</font></div><div><font face="monospace, monospace"><br></font></div><div>What do you think ? Is there a hash expert around that can confirm this is a good idea ? </div></div>

</div>

</div></blockquote></div><br></div></div></div></div><br></blockquote></div><br></div>

</blockquote></div><br></div></div>