On Sun, 30 Sep 2001, Lex Spoon wrote:
On disk, the exact number of dereferences makes a big difference. With
Its the number and the predictability of the dereference. If you can preload it, it doesn't matter. In this case, the cost of loading 16 bytes versus 16k is so little, that its faster to load the 16k, do 10 binary tree comparisons, than do 10 disk seeks.
a binary tree, you divide your search space by 2 with each dereference; with a btree, you divide it by a larger number. I'd guess it's similar for skip lists, though I've only heard about them, not studied them.
Hmm, with "main" memory being so much slower than cache, btrees might even start making sense for in-memory structures. But not necessarily
Nope. Most cachelines are 16 bytes.. You pay a lot more with the extra comparisons. Nor do you save anything with teh ability to prefetch; you cannot predict which branches you'll need. You might get semi-close with splay trees though, because they can use locality of access, so the effective number of accesses is much better.
I'll probably eventually implement them.
for Squeak, where an "array of foo's" won't necessarily have all of the foo's next to each other, anyway, and thus where accessing items within a node still causes fetches to distant parts of memory.
Distant parts don't hurt; cache lines are 16 bytes. The closest they could get to hurting is if they hit the TLB too heavily. (TLB for x86 has 64 entries for 4kb pages. Or, 256k. access outside that costs about 3 memory lookups to read in the new page table entry for that 4kb. PTE lookups can be cached.)
Anyway, I'd rather work on something higher-level. This hyper-optimization stuff is only useful on small portions of most of the programs I ever work on.
Exactly.. The O(log n) will beat your O(n) every time you start getting a lot of data. If it does in profiling, then implement the O(log n) algorithm, and contribute it.
Don't discount profiling.. Profiling has found out places in my network stream library where invoking #origionalContents instead of #contents sped it up by 3x,
And this little beauty: fidding with one line of code has lead to speedups of 30% in the queueing code.
I found out that: ``readPosition > (writeLimit/2)'' is about 30 times slower than ``readPosition * 2 > writeLImit'' (if writeLimit is randomly distributed. If writeLimit is always odd, its 60x.)
Which was the absolute last thing I expected to be consuming 20% of my CPU time. Profiling is useful, use it. :)
Scott
-- No DVD movie will ever enter the public domain, nor will any CD. The last CD and the last DVD will have moldered away decades before they leave copyright. This is not encouraging the creation of knowledge in the public domain.