Interpreter>>pushReceiverVariableBytecode

Mon Sep 9 13:54:37 UTC 2002

Thanks for the explanation...it all makes sense now.

- Stephen

> -----Original Message-----
> From: squeak-dev-admin at lists.squeakfoundation.org=20
> [mailto:squeak-dev-admin at lists.squeakfoundation.org] On=20
> Behalf Of Ian Piumarta
> Sent: Friday, September 06, 2002 8:02 PM
> To: squeak-dev at lists.squeakfoundation.org
> Subject: RE: Interpreter>>pushReceiverVariableBytecode
>=20
>=20
> Stephen,
>=20
> On Fri, 6 Sep 2002, Stephen Pair wrote:
>=20
> > Ok, I see how the CCodeGenerator is making this method work...but I=20
> > still don't see why it's necessary to do the=20
> #fetchNextBytecode first
>=20
> A traditional interpreter loop looks like this:
>=20
>   while (1) {
>     int bytecode=3D fetchNextBytecode();
>     switch (bytecode)
>     { ... 256 cases ...; break; }
>   }
>=20
> and introduces a potential execution stall (in the processor)=20
> between the fetching of the bytecode and the dispatch to the=20
> corresponding case label (the processor might have to wait=20
> several cycles for the bytecode to arrive from memory before=20
> being able to proceed to use it).
>=20
> The thing to notice is that the above is semantically=20
> equivalent to this:
>=20
>   int bytecode=3D fetchNextBytecode();
>   while (1) {
>     switch (bytecode)
>     {
>       case N: doTheWork();
>               bytecode=3D fetchNextBytecode();
>               break;
>       ... x 256 ...
>     }
>   }
>=20
> from where it's trivial to see that
>=20
>     switch (bytecode)
>     {
>        case N: bytecode=3D fetchNextBytecode();
>                doTheWork();
>                break;
> =A0   }
>=20
> is the same too (modulo your problem which isn't a problem in=20
> interp.c because the var becomes a constant during CCodeGen)=20
> -- except that now we might have avoided the stall (because=20
> even if it takes several cycles to fetch the bytecode from=20
> memory, this can happen in parallel with doingTheWork).
>=20
> > (which isn't correct if you're running in Smalltalk).
>=20
> This method is overridden in InterpSimulator precisely for=20
> that reason.
>=20
> > Why can't the #fetchNextBytecode be the last statement as=20
> it is in the=20
> > InterpreterSimulator version of this method?
>=20
> Because the speedup is measurable, _significantly_ so when=20
> using gcc in which case the final "break" in each bytecode is=20
> converted (manually, by an awk script run on the interp.c=20
> file) into an explicit dispatch directly to the next=20
> bytecode's case label
>=20
>    void *bytecodeDisatchTable[256] =3D { &label0, ..., &label255 };
>    ...
>    case N: labelN:
>            bytecode=3D fetchNextBytecode();
>            doTheWork();
>            goto *bytecodeDispatchTable[currentBytecode]; /* break */
>=20
> which entirely eliminates the interpreter's dispatch loop.
>=20
> I hope this helps to explain why things are the way they are.
>=20
> Regards,
>=20
> Ian
>=20
>=20
>=20