Switching to use foo struct on Windows VM

bryce at kampjes.demon.co.uk bryce at kampjes.demon.co.uk
Sun Jul 15 10:50:45 UTC 2007


Andreas Raab writes:
 > sig wrote:
 > > I tried to introduce VM pointers table for use by Exupery, but found
 > > that there's no common way for adding this code because all platforms,
 > > except win32 using foo struct for globals.
 > 
 > Can you say what the requirements for this patch are? E.g., why exactly 
 > does it matter if the VM is compiled with struct foo or not?

The goal is to provide a generic way of getting pointers to the
interpreters variables and functions. Exupery needs these because it
generates code that does the same thing as the interpreter. Sig needs
these as he's interested in allowing low-level programming to be done
inside the image. At the moment Exupery has a lot of trivial accessor
functions to return the addresses.

The problem is you can't put "&foo->activeContext" into a initialiser
in C as at compile time C can not know where foo points.

Using #returnPrefixFromVariable: to generate the variable accessing
code will also allow generated code to work in VM's that use foo or
don't use foo. #returnPrefixFromVariable: is called when translating
addressOf: for this reason.

I'm guessing that the problem could also be solved by generating
accessors the way that your #addressOf: operation does.

 > > benchmark shows no noticeable difference using foo struct or not.
 > > Maybe this is bad benchmark for this case..
 > 
 > This result is quite surprising. When John originally introduced this 
 > option, x86 was significantly slower when compiling with than without 
 > it. As a matter of fact, given that probably some 90+% of all Squeak 
 > platforms are now x86 I was thinking about removing it altogether (after 
 > all, it's just a pointless memory dereferencing which is only 
 > advantageous on platforms that don't have direct addressing modes).

Low level performance is getting more complex as it gets faster. The
interpreter does not execute many instructions per clock (sorry, I
don't have the numbers handy and they will change depending on
architecture). Given how low the instructions per clock is adding
extra work to the interpreter doesn't matter so long as the extra work
stays inside the delays (probably branch misspredicts) that are
currently limiting the interpreters speed. That's the magic of out of
order execution.

I'd guess that on slower in-order x86 CPUs using foo will have more of
an adverse impact on performance. And having foo is likely to be
most important on slower CPUs including ARMs in phones/handhelds.

Bryce



More information about the Squeak-dev mailing list