>> It's not uncommon to find x86 code padded with a lot of NOPs
>> that weren't in the original source.
>Cowardly assembler! ;-)
>> [NOPs...] Sometimes they're there to pad the execution pipeline, and
>> sometimes it's just because the code generator allocates a whole large
>> enough for the largest form of an instruction to simplify the branch
>> target calculations.
>We must be talking about M$oft assemblers here. ;^p)
Yes. This has been pretty characteristic of MS assemblers as far
back as I can remember (i.e. about MASM 2.0 - I remember being
quite excited when MASM 4.0 came out - '186 and '286 code!).
>The real problem is that the solution in general might require N
>iterations for a piece of code with N branches -- if promoting one of them
>from short to long causes the next (or, much more problematically, a
>preceding) one to become long too.
You're such an optimist. O(n) is close to best case for this
>This is a weak form of what I suppose one might call an "oscillating
>constraints" problem. (Hands up anyone who has ever seen [La]TeX get
Hence, stochastic optimization. Even MS uses stochastic optimizers
now, even in vb. (didn't want to say that too loud here. ;) )
>The usual cause for "padding" in (capable) assemblers (other than for
>prefetch alignment, as you point out) is due to dynamic linking where the
>size of call sites isn't known until link time. (On modern systems the
Not only prefetch alignment. I have heard, although I don't know
the details that sometimes it's desirable to force a dead cycle in
the execution pipeline. I can only guess that this would be to
influence the scheduling of the execution units for complex
instructions, but I consider myself to be only a "casual observer"
with respect to this kind of thing.
>In Smalltalk you've also got a bigger problem: for very long methods the
>range of even unconditional branches might not be enough, at which point
>I know this has nothing whatsoever to do with Smalltalk...)
Well, to kind of tie this back into something relevant to
Squeak - I know a few people have talked about "new" VM
instruction sets. It seems that it might help to think
about a VM where all instructions are the same size.
This could simplify code generation (i.e. compiling methods),
and possibly improve performance and *reduce* code size (at
least in the case of long methods). The anticipated performance
improvement would come from forcing aligned fetches more often
and potentially simpler instruction decoding.
It's really nice to work with an architecture where all instructions
take up one word of memory, take one cyle to execute, and sometimes
can cause five or six different useful things to occur during that
one cycle. (ok, so I write a lot of DSP code...) The single cycle
thing isn't necessarily relevant for Smalltalk, but those "extended"
bytecodes are really distasteful to me.