Interpreter>>pushReceiverVariableBytecode
Stephen Pair
squeak-dev at lists.squeakfoundation.org
Mon Sep 9 13:54:37 UTC 2002
Thanks for the explanation...it all makes sense now.
- Stephen
> -----Original Message-----
> From: squeak-dev-admin at lists.squeakfoundation.org=20
> [mailto:squeak-dev-admin at lists.squeakfoundation.org] On=20
> Behalf Of Ian Piumarta
> Sent: Friday, September 06, 2002 8:02 PM
> To: squeak-dev at lists.squeakfoundation.org
> Subject: RE: Interpreter>>pushReceiverVariableBytecode
>=20
>=20
> Stephen,
>=20
> On Fri, 6 Sep 2002, Stephen Pair wrote:
>=20
> > Ok, I see how the CCodeGenerator is making this method work...but I=20
> > still don't see why it's necessary to do the=20
> #fetchNextBytecode first
>=20
> A traditional interpreter loop looks like this:
>=20
> while (1) {
> int bytecode=3D fetchNextBytecode();
> switch (bytecode)
> { ... 256 cases ...; break; }
> }
>=20
> and introduces a potential execution stall (in the processor)=20
> between the fetching of the bytecode and the dispatch to the=20
> corresponding case label (the processor might have to wait=20
> several cycles for the bytecode to arrive from memory before=20
> being able to proceed to use it).
>=20
> The thing to notice is that the above is semantically=20
> equivalent to this:
>=20
> int bytecode=3D fetchNextBytecode();
> while (1) {
> switch (bytecode)
> {
> case N: doTheWork();
> bytecode=3D fetchNextBytecode();
> break;
> ... x 256 ...
> }
> }
>=20
> from where it's trivial to see that
>=20
> switch (bytecode)
> {
> case N: bytecode=3D fetchNextBytecode();
> doTheWork();
> break;
> =A0 }
>=20
> is the same too (modulo your problem which isn't a problem in=20
> interp.c because the var becomes a constant during CCodeGen)=20
> -- except that now we might have avoided the stall (because=20
> even if it takes several cycles to fetch the bytecode from=20
> memory, this can happen in parallel with doingTheWork).
>=20
> > (which isn't correct if you're running in Smalltalk).
>=20
> This method is overridden in InterpSimulator precisely for=20
> that reason.
>=20
> > Why can't the #fetchNextBytecode be the last statement as=20
> it is in the=20
> > InterpreterSimulator version of this method?
>=20
> Because the speedup is measurable, _significantly_ so when=20
> using gcc in which case the final "break" in each bytecode is=20
> converted (manually, by an awk script run on the interp.c=20
> file) into an explicit dispatch directly to the next=20
> bytecode's case label
>=20
> void *bytecodeDisatchTable[256] =3D { &label0, ..., &label255 };
> ...
> case N: labelN:
> bytecode=3D fetchNextBytecode();
> doTheWork();
> goto *bytecodeDispatchTable[currentBytecode]; /* break */
>=20
> which entirely eliminates the interpreter's dispatch loop.
>=20
> I hope this helps to explain why things are the way they are.
>=20
> Regards,
>=20
> Ian
>=20
>=20
>=20
More information about the Squeak-dev
mailing list
|