Hi.
No kidding Color>>#hash is not good enough! I changed the cache's implementation from a Dictionary to a coupled SortedCollection pair and the execution time went down from 4m 15s to 1m 30s!!! There are two sorted collections, the first contains associations between colors and their corresponding closest quantized colors, and the second has just the colors. The second is looked up for the corresponding index in the first. As the first is sorted on the key value of associations, the indexes match. All the lookups are done using a QuickSearch object that knows how to perform binary lookups on sorted collections. That alone almost cut execution time by 3!!! The new tally leaves look like this:
8.3 SortedCollection class>>new: 7.5 SortedCollection(OrderedCollection)>>at: 6.4 QuickSearch>>qSearchFrom:to: 5.9 Color>>distanceTo: 5.6 BitBlt>>pixelAt: 5.3 QuickSearch>>is:leq: 4.8 Form>>setExtent:depth: 4.6 BitBlt>>pixelAt:put: 4.4 BitBlt class>>destForm:sourceForm:hal...rigin:extent:clipRect: 4.2 Rectangle class>>origin:extent: 4.1 Form class>>extent:depth: 4.1 SortedCollection(OrderedCollection)>>insert:before: 2.0 Color class>>r:g:b:range:
By the way, if you get aForm and send edit to it in a morphic world, it hangs until you press ctrl-break, gives out a strange error and you can continue by aborting it. And when you get up to the parent (still MVC) project, all your aborted edits appear on the screen... ???.
Andres.
Arrrrgggg.
Some of us (basically, friends and friends of friends, where the FoFs are hashing experts) proposed a better hash function for Java Strings, apparently with no luck. It would be REALLY NICE if hash functions actually did a decent job of splattering almost every set of inputs into what looks like noise (such functions exist) first, and then people worried about performance of the hash function. In the case of the three values R, G, B, a good hash might be something like
K1 * R + K2 * G + K3 * B
where K1, K2, and K3 are random numbers, and NOT powers of two minus 1 (as is the case for Java Strings). It's better if K1, K2, and K3 happen to be largish primes. If the hash value is supposed to be positive, then you can do slightly better by taking the raw result (treated as an unsigned number) and take its value modulo 0x7fffffff (a prime, I think) which can be computed with (in Java)
t0 += (t0>>31) & 0x7fffffff; /* -1 <= t0 <= 2^{31}-1 */
If the input (t0) can have the value 0x80000000 then you need to do this a second time, since doing it once only gets you to
0xffffffff (== -1 == 0x7ffffffe mod 0x7fffffff)
Another possibility is residues in GF(2**N), if you want to treat the color as 4 8-bit numbers and do the table lookup, but in our work with hash functions for Java strings, we got better performance from multiplication by constants, though we did have to fool around quite a bit to get the performance up.
And yes, I think that in general the DEFAULT hash function should be a good one. If an "expert" wishes to substitute a fast one with a rotten distribution, that is the sort of foot-shooting game that "experts" often play.
Sorry for the rant, but I've about had it with the sort of "golly, if we don't do this brain-dead thing, it will be slow" by alleged computer scientists who don't even take the time to see how well a proper hash function (or algorithm, or whatever) can be made to run if they really think about the problem.
David Chase NaturalBridge LLC
squeak-dev@lists.squeakfoundation.org