Scott A Crosby
crosby at qwes.math.cmu.edu
Sun Sep 30 18:53:38 UTC 2001
On Sun, 30 Sep 2001, Lex Spoon wrote:
> On disk, the exact number of dereferences makes a big difference. With
Its the number and the predictability of the dereference. If you can
preload it, it doesn't matter. In this case, the cost of loading 16 bytes
versus 16k is so little, that its faster to load the 16k, do 10 binary
tree comparisons, than do 10 disk seeks.
> a binary tree, you divide your search space by 2 with each dereference;
> with a btree, you divide it by a larger number. I'd guess it's similar
> for skip lists, though I've only heard about them, not studied them.
> Hmm, with "main" memory being so much slower than cache, btrees might
> even start making sense for in-memory structures. But not necessarily
Nope. Most cachelines are 16 bytes.. You pay a lot more with the extra
comparisons. Nor do you save anything with teh ability to prefetch; you
cannot predict which branches you'll need. You might get semi-close with
splay trees though, because they can use locality of access, so the
effective number of accesses is much better.
I'll probably eventually implement them.
> for Squeak, where an "array of foo's" won't necessarily have all of the
> foo's next to each other, anyway, and thus where accessing items within
> a node still causes fetches to distant parts of memory.
Distant parts don't hurt; cache lines are 16 bytes. The closest they could
get to hurting is if they hit the TLB too heavily. (TLB for x86 has 64
entries for 4kb pages. Or, 256k. access outside that costs about 3 memory
lookups to read in the new page table entry for that 4kb. PTE lookups can
> Anyway, I'd rather work on something higher-level. This
> hyper-optimization stuff is only useful on small portions of most of the
> programs I ever work on.
Exactly.. The O(log n) will beat your O(n) every time you start getting a
lot of data. If it does in profiling, then implement the O(log n)
algorithm, and contribute it.
Don't discount profiling.. Profiling has found out places in my network
stream library where invoking #origionalContents instead of #contents sped
it up by 3x,
And this little beauty: fidding with one line of code has lead to speedups
of 30% in the queueing code.
I found out that:
``readPosition > (writeLimit/2)'' is about 30 times slower than
``readPosition * 2 > writeLImit''
(if writeLimit is randomly distributed. If writeLimit is always odd, its
Which was the absolute last thing I expected to be consuming 20% of my
CPU time. Profiling is useful, use it. :)
No DVD movie will ever enter the public domain, nor will any CD. The last CD
and the last DVD will have moldered away decades before they leave copyright.
This is not encouraging the creation of knowledge in the public domain.
More information about the Squeak-dev