There's memory bandwidth and there's memory transaction thruput

Joachim Durchholz joachim.durchholz at
Sun Feb 14 12:22:03 UTC 1999

johnm at wrote:
> But is there a caching penalty for addessing adjacent word addresses
> in decreasing address order?

There is no caching penalty, but on some processors, there might be a
read-ahead penalty (some processors do read-ahead, just like some disk
drivers do a read-ahead on blocks).
However, to really find out what's going on, one should monitor the
memory bus and look what's happening. (Use an NT machine and set
Squeak's priority to maximum if you don't want to monitor all sort of
NT-specific stuff. Or have Squeak run in some regular pattern and look
for recurring events on the bus, and analyze *these*.)

> I suspect not, since Squeak objects are not aligned on cache-line
> boundaries, so the two words preceding the base header word are just
> as likely to fall into the same cache line as the two words following
> it. Does that sound right?

It does. However, aligning blocks (including headers words) on 16-byte
boundaries should improve performance, as loading the header will always
load the first few instance variables (and the full header if it is more
than one word).

> Here's another question: are there any tools that could help one
> understand how well Squeak uses the memory system? For example, a
> tool that reports cache miss rates or one that would tell us if
> Squeak is encoutering the TLB misses that Jan described. I suppose
> some of this stuff would need a logic analyzer...

Newer Intel processors have all sort of debug registers that should
answer at least some of these questions. On the downside, you will never
know how much of these answers will be transferrable to other
processors, including those that don't have these registers, so a logic
analyzer will give the most reliable answers. OTOH a logic analyzer may
not give an answer at all, at least not with reasonable effort.

You can access the debug registers even under Windows NT, but you need
device drivers to do that. (Not because the registers are devices but
because only they are allowed to fiddle with internals and privileged
opcodes.) There are even device driver frameworks that allow you to link
your own C code with it (with the usual caveats: anything that causes
interrupts or calls system services may crash the entire OS; so it's
best to allocate a fixed-size buffer to collect data and have a
non-device driver part that regularly flushes the buffer).

Please don't send unsolicited ads.

More information about the Squeak-dev mailing list