Progrmaming in Bytecode?

Thu Aug 1 17:53:00 UTC 2002

>> 	It's not uncommon to find x86 code padded with a lot of NOPs
>> that weren't in the original source.
>
>Cowardly assembler! ;-)
>
>> [NOPs...] Sometimes they're there to pad the execution pipeline, and
>> sometimes it's just because the code generator allocates a whole large
>> enough for the largest form of an instruction to simplify the branch
>> target calculations.
>
>We must be talking about M$oft assemblers here. ;^p)

  Yes.  This has been pretty characteristic of MS assemblers as far
  back as I can remember (i.e. about MASM 2.0 - I remember being
  quite excited when MASM 4.0 came out - '186 and '286 code!).

>The real problem is that the solution in general might require N
>iterations for a piece of code with N branches -- if promoting one of them
>from short to long causes the next (or, much more problematically, a
>preceding) one to become long too.

  You're such an optimist.  O(n) is close to best case for this
  problem.

>This is a weak form of what I suppose one might call an "oscillating
>constraints" problem.  (Hands up anyone who has ever seen [La]TeX get

  Hence, stochastic optimization.  Even MS uses stochastic optimizers
  now, even in vb.  (didn't want to say that too loud here. ;) )

>The usual cause for "padding" in (capable) assemblers (other than for
>prefetch alignment, as you point out) is due to dynamic linking where the
>size of call sites isn't known until link time.  (On modern systems the

  Not only prefetch alignment.  I have heard, although I don't know
  the details that sometimes it's desirable to force a dead cycle in
  the execution pipeline.  I can only guess that this would be to
  influence the scheduling of the execution units for complex
  instructions, but I consider myself to be only a "casual observer"
  with respect to this kind of thing.

>In Smalltalk you've also got a bigger problem: for very long methods the
>range of even unconditional branches might not be enough, at which point

>I know this has nothing whatsoever to do with Smalltalk...)

  Well, to kind of tie this back into something relevant to
  Squeak - I know a few people have talked about "new" VM
  instruction sets.  It seems that it might help to think
  about a VM where all instructions are the same size.

  This could simplify code generation (i.e. compiling methods),
  and possibly improve performance and *reduce* code size (at
  least in the case of long methods).  The anticipated performance
  improvement would come from forcing aligned fetches more often
  and potentially simpler instruction decoding.

  It's really nice to work with an architecture where all instructions
  take up one word of memory, take one cyle to execute, and sometimes
  can cause five or six different useful things to occur during that
  one cycle.  (ok, so I write a lot of DSP code...)  The single cycle
  thing isn't necessarily relevant for Smalltalk,  but those "extended"
  bytecodes are really distasteful to me.

						-Dean