[ENH][VM] Improved code generation (hopefully ;)
John M McIntosh
johnmci at mac.com
Wed Jul 9 01:17:35 UTC 2003
> From: "Andreas Raab" < andreas.raab at g... >
> Date: Tue Jul 8, 2003 10:55 pm
> Subject: RE: [ENH][VM] Improved code generation (hopefully ;)
>
>
>
> Hi,
>
> > I've been looking at commonReturn and started to wonder why
> > the usage of
> >
> > localCntx _ localReturnContext.
> > localVal _ localReturnValue.
>
> Mostly because I wasn't sure if all compilers will be able to
> recognize that
> these values are in fact used read-only for the branch. E.g., a "lesser
> compiler" might not look at the branch individually and just say "well
> that
> gets read and written all over the places so I won't even try to
> optimize it
> into registers". If you got a smart compiler then the first thing
> it'll do
> is to recognize that this really *is* an alias and no bad things will
> happen
> as a result. Or at least that's what I think ;-)
>
> In addition, I wanted to be able to play around with making these
> variables
> global and that may hurt PPC and others as long as there aren't any
> explicit
> hints given that these values should be kept in registers (see all
> those
> "keep in register" comments spread out through interpreter).
Well they don't get made global because of the fact they are only used
in one routine, it doesn't hurt
the ppc code since we've lots of registers to work with. Making them
truly global will hurt things for powerpc.
>
> > You see with the current VMMaker any sole usage of a instance
> > variable in Interpreter will get folded
> > into a local variable versus a global if all references to that
> > instance variable inline into a single C procedure.
>
> WHAT??? Are you trying to tell me that if I add an instance variable to
> class Interpreter which is only used in a single method (most likely
> interpret()) it will automatically become a temp of that method? Where
> can I
> turn this off? I don't want it - it's bad, bad, bad! Global variables
> are
> cheap, stack relative addressing is expensive if your registers are
> already
> cramped up with localIP,SP, etc.
If you have an accessor for the global, then it won't become localized
because
then 2 or more procedures access the global. The reason for this code
was to optimize
the garbage collector, which is spread across multiple methods, but
mostly each
phase (markAndTrace/Sweep) becomes completely inlined, so the working
variables are quite localized.
For the powerpc this make a significant improvement upwards of 50%.
See the C Code Generator localizeGlobalVariables methods. Maybe you
can run it off for INTEL, but ensure you don't greatly impact GC
benchmarks before doing this.
as for your interpret() question mine has defined
int interpret(void) {
#ifdef FOO_REG
register struct foo * foo FOO_REG = &fum;
#endif
int localReturnValue;
int localReturnContext;
int localHomeContext;
register char* localSP SP_REG;
register char* localIP IP_REG;
register int currentBytecode CB_REG;
which I think are all there on purpose? If you were to add a global
foobarCounter and only reference it
in an inlined procedure in interpret() then it would become a local
variable, unless you make an accessor for it.
>
> > However I'll point out that GCC and codewarrior with
> > optimization decides we are idiots and ignores
> > the localCntx/Val constucts because they are read only and
> > fold back to the localReturnContrext/Value
>
> Which is exactly what a good compiler _should_ do ;-)
>
> > I did create a change set to do this, but then in doing this
> > I reviewed the usage of externalizeIPandSP and decided one of
> > the main usage is to avoid issues with positive32BitIntegerFor:
>
> I don't understand you. What issues are you talking about and why is
> it the
> "main" usage? Both, externalizeIPandSP as well as internalizeIPandSP
> are
> used to transfer state between interpret() and the rest of the system
> so
> that whenever we get out of interpret() we can still refer to the
> instruction pointer and the stack pointer. I don't see what this has
> to do
> with #positive32BitIntegerFor:.
I believe the reason for invoking externalizeIPandSP for 'most' calls
which for example
the bit math routines are to guard against positive32BitIntegerFor
allocating a largeInteger. The
floating routines I don't consider because they usually allocate a
float object. But I'll think more
on if it's worth doing.
>
> > Which points to considering inlining the bit bytecode
> > primitives, and primitivePointX & primitivePointY
>
> Do you have any benchmarks that show the effect? I hate adding
> complexity
> without actually improving anything.
Well the primitivePointX and primitivePointY are simple.
| p a b |
p _ Point x: 1 y: 2.
v _ Time millisecondsToRun: [10000000 timesRepeat: [a _ p x. b _ p y]].
^v
gives 8572 & 9151 before
8140 & 7964 after
primitivePointX
| rcvr |
self inline: false.
rcvr _ self popStack.
self assertClassOf: rcvr is: (self splObj: ClassPoint).
successFlag
ifTrue: [self push: (self fetchPointer: XIndex ofObject: rcvr)]
ifFalse: [self unPop: 1]
becomes
primitivePointX
| rcvr |
self inline: true.
rcvr _ self internalStackTop.
self internalPop: 1.
self assertClassOf: rcvr is: (self splObj: ClassPoint).
successFlag
ifTrue: [self internalPush: (self fetchPointer: XIndex ofObject:
rcvr)]
ifFalse: [self internalUnPop: 1]
>
> > Also this brought back another memory, I'm sure Anthony
> > (years?) back pointed out that usage of
> > instantiateSmallClasssizeInBytesfill results in filling the
> > allocated object with 0 or nil, but then we just fill the object with
> > data right right away, this is silly. Think we could follow thru on
> > his idea?
>
> Well, I think we may want to consider a new primitive which is capable
> of
> allocating bit objects without initialization (how about
> #primitiveDirtyNew
> ;-) The code could be used in those places where we care on the
> VM-level
> such as float or large integer allocation. BTW, I'm no big fan of
> making
> this the default for bit objects - it gives you the ability to read
> arbitrary memory from Squeak and that's a big security risk (think
> about a
> situation in which we just entered a password and afterwards we simply
> allocate a huge chunk of memory and search for this password). If we
> add
> this ability it should be the exceptional case and not the default.
no problem there.
>
> And again, before doing anything alike I want to see benchmark results.
>
> > And why does signed32BitIntegerFor: use instantiateClass, versus
> > instantiateSmallClasssizeInBytesfill?
>
> At the time I wrote this, it had (literally) no users. Certainly not
> in any
> critical places. If you can show me some improvement in any benchmarks
> you
> devise we can certainly change it ;-)
Well it's just a consistency thing, because positive32BitIntegerFor: &
the signed/positive64BitIntegersFor: all use the
instantiateSmallClasssizeInBytesfill: and I was wondering if there was
some technical reason for not using
it.
instantiateSmallClasssizeInBytesfill is used by
floatObjectOf,
makePointWIthxValue:yValue:,
positive32BitIntegerFor:
positive64BitIntegerFor:
signed64BitIntegerFor:
I'll note that LargeIntegersPlugin is a heavy user of instantiateClass
and there I really wonder about filling with zeros, then for the most
part refilling with the actual data.
So given that we've already got the allocator setup to do no fill,
because that's how contexts are allocated, I made a
instantiateSmallClass: classPointer sizeInBytes: sizeInBytes
and for
| v |
v _ Time millisecondsToRun: [10000000 timesRepeat: [1 asFloat]].
^v
Before 13081,14205,12756
After 12679,12899,12470
Gain is slight, more smarter tools showed that the fill in
instantiateSmallClasssizeInBytesfill:
took about 15% of the cycles, so it is significant for the execution of
the routine.
If I try
| p |
p _ InputSensor new.
v _ Time millisecondsToRun: [10000000 timesRepeat: [p primMousePt]].
then times like
before 32688
after 30763
are likely
So I'll wrap up a change set later tonight (4-6 hours out).
I'll note that Point x: y: doesn't invoke makePoint
bytecode?/primitive? or am I missing something here?
x: xInteger y: yInteger
"Answer an instance of me with coordinates xInteger and yInteger."
^self new setX: xInteger setY: yInteger
Ah, yes, in number @ falls back to Point x:y: on failure.
10,000,000 Point x: 1 y: 2 takes 22.815 seconds, but 1 at 2 takes 12,033,
really quite significant.
But then are other usage of Point x: y: in the image valid?
say in POVertex or BlobMorph?
--
========================================================================
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
More information about the Squeak-dev
mailing list
|