There's memory bandwidth and there's memory transaction thruput

Wed Feb 17 09:22:42 UTC 1999

sqrmax at cvtci.com.ar wrote:

>  The 68K has much more registers than the intel processors. Intel ones (386
>  and up) have 4 general use registers named A, B, C and D (with suffix X
>  meaning 16 bits, with extra E prefix meaning 32 bits, or with H and 
>L suffixes to
>  mean lower 8 bits and higher 8 bits of the first 16 bits), then segment and
>  offset registers named (add an E prefix as needed for their 32 bit sibling in
>  protected or real mode) CS, DS, SS, ES, DI, SI, SP, BP, FS and GS. These last
>  registers have limited arithmetic capabilities, especially the ones ending
>  in S (for segment). The 68K has 8 32bit data registers (d0 to d7) and 8 32bit
>  index registers (a0 to a7) fully arithmetic capable (to my knowing). I think
>  the 68K with its variants lacks the raw speed of the Intels, although it's
>  much more maneuverable. I'd like to see intel processors with some of the 68K
>  characteristics (like asynchronous IO for instance, why do you think intel
>  machines need FIFOs everywhere?).
>
>  Andres.

The address registers on the 68000 supported add and subtract, but 
not multiply,
divide, nor any bitfield ops (bitwise OR, AND, etc).  The data 
registers could hold
an array index (an address register would hold the base address of 
the array), but
could not otherwise be used for referencing memory locations.  The 
68000 was not
all that far from qualifying as a RISC processor, compared to a 
canonical CISC CPU
such as the VAX.  It's failings with respect to RISC were:

     1. The 68k used microcode instead of hardwired instructions, so 
most instructions
     required 2 or more clock cycles to execute.  The Intel processors always
     had a lower average CPI (cycles per instruction) due to much less 
reliance on
     microcoded instructions (especially in the 486 and later 
processors). This was
     the great advantage of the x86 processors over the 68k family.

     2. The 68k had too few registers.  To satisfy the RISC 
definition, a processor should
     have (at least) 32 fully general purpose registers.  The 68000 
only had 16 "general purpose"
     registers--and they weren't symmetrically general purpose 
(although they were
     much closer to this than the x86 line, even today).  The 68k 
would have looked far
     worse relative to its x86 competition but for its much larger set 
of general purpose
     registers.

     3. Most 68k opcodes had addressing modes where at least one operand was a
     memory address.  In a RISC processor, only the "load" and "store" 
instructions
     (what the 68000 would call "MOVE," and the x86 would call "MOV") should
     reference memory locations.  This restriction has two purposes:

         a. Instructions that don't reference memory can be 
implemented with less
         circuitry, require less complex instruction decode logic, and 
don't have to
         worry about cache misses and/or page faults.

         b. Separating memory load/store operations into separate instructions
         makes it easier to optimally schedule instructions to make good use of
         processor parallelism and pipelining.  If a "load" 
instruction can be issued a
         few instructions before the data it loads is actually used, 
then it is more likely
         that the data will have arrived in the register before the 
first instruction that
         references it has begun to execute, thus avoiding a processor 
stall while it
         waits for data from memory.

     4. The 68k uses 2-operand opcodes.  RISC processors generally use 
3-operand opcodes.
     To compute the sum of two numbers, the 68000 would use code like 
the following:

                 opcode    operands    comments
                MOVE     (A7)+,D0;    pop the top of the stack into D0
                ADD        (A7)+,D0;    pop the top of the stack, add 
it to the value of D0, store result in D0
                MOVE     (A7)-,D0;     push D0 onto the stack

     In contrast, the code for the same thing on a RISC would look like this:

                 opcode    operands                comments
                 LD        (R31),+R30,R1;        R31 += R30; R1 = *R31;
                 LD        (R31),+R30,R2;        R31 += R30; R2 = *R31;
                 ADD      R3,R1,R2;                R3 = R1 + R2;
                 ST        (R31),-R30,R3;        R31 -= R30; *R31 = R3;

     The superiority of the RISC approach may not be obvious (it 
requires one more instruction
     than the 68k, after all!), until one considers what really 
happens in real code, as opposed
     to this contrived example. Note, for example, that the 68k code 
overwrites the value of
     one of the addends with the sum, which the RISC code does not do. 
And if the two addends
     happen to already be in registers, then the addition can be done 
in one instruction--without
     overwriting one of the addends.  Also, if one actually counts the 
number of clock cycles
     required to run either the 68k or RISC code in the above 
examples, one quickly realizes
     that the time required by the memory loads means that 4 clock 
cycles is the best that can
     be done, regardless of whether this is coded as 1, 2, 3 or 4 
instructions (the 68k ADD
     memory-to-register instruction will require at least two clock 
cycles, no matter what).

     5. The 68000 has too many addressing modes.  The 68020 made this 
even worse.  This is
     the other area in which the x86 processors have typically been 
superior to the 68k: not
     overdoing the addressing modes quite so egregiously.

Motorola released the 68060 in 1994, which finally did away with most 
of the microcode, and
permitted most instructions to run in only one clock cycle (unless 
they referenced memory).
The 68060 maxed out at 60Mhz or so (I stopped following it after it 
was released), which was
on the low side at the time.  It did away with the addressing modes, 
and some of the instructions,
that had been introduced with the 68020 10 years before. Alas, this 
was way too late to make
any difference for the 68k family, which has slowly faded into the 
twilight of embedded processordom.

So RISC did kill off all the CISC processor families used in general 
purpose computer systems, except
for two old warhorses whose installed base and dominance in their 
market segments precluded their
retirement: the x86 family, and the venerable System370.

For Smalltalk, the key performance issue is memory access bandwidth, 
and the high "non-locality
of reference" that characterizes sequential access to memory 
locations in Smalltalk programs. Caches
work best when memory is accessed sequentially, which is 
unfortunately precisely what Smalltalk
systems tend not to do.  Perhaps if the cache logic were smart enough 
to understand the structure
of Smalltalk objects, it could pre-fetch the objects referenced by 
the instance variables of objects
loaded into the cache, to some finite (and configurable) depth...

--Alan

--Alan

Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Alan Lovejoy
Content-Disposition: attachment; filename="vcard.vcf"

Attachment converted: Anon:vcard.vcf 8 (TEXT/ttxt) (00006EEA)