sqGnu.h

Ned Konz ned at bike-nomad.com
Thu Apr 4 06:56:46 UTC 2002


On Wednesday 03 April 2002 10:05 pm, Ned Konz wrote:

> I don't know that much about the possibilities, but on my machine, if I let
> gcc do aggressive optimization using these flags:
>
> CFLAGS = -I/usr/X11R6/include -O3 -DLSB_FIRST=1 -Wa,-a -fno-gcse
> -fomit-frame-pointer -fschedule-insns -g
>
> then I see that the dispatch is a single (albeit 7 byte long) instruction
> (line 11648):
>
> 5160:gnu-interp.c  **** 		CASE(1)
> 5161:gnu-interp.c  **** 			/* pushReceiverVariableBytecode */
> 5162:gnu-interp.c  **** 			/* begin fetchNextBytecode */
> 5163:gnu-interp.c  **** 			currentBytecode = byteAt(++localIP);
> 5164:gnu-interp.c  **** 			/* begin pushReceiverVariable: */
> 5165:gnu-interp.c  **** 			/* begin internalPush: */
> 5166:gnu-interp.c  **** 			longAtput(localSP += 4, longAt(((((char *)
> receiver)) + 4) + ((1 & 15) << 2)));
>  11634              	.stabn 68,0,5166,.LM1552-interpret
>  11635              	.LM1552:
>  11636 2b60 A1000000 		movl receiver,%eax
>  11636      00
>  11637              	.stabn 68,0,5163,.LM1553-interpret
>  11638              	.LM1553:
>  11639 2b65 46       		incl %esi
>  11640 2b66 0FB62E   		movzbl (%esi),%ebp
>  11641              	.stabn 68,0,5166,.LM1554-interpret
>  11642              	.LM1554:
>  11643 2b69 83C704   		addl $4,%edi
>  11644 2b6c 8B4008   		movl 8(%eax),%eax
>  11645 2b6f 8907     		movl %eax,(%edi)
> 5167:gnu-interp.c  **** 			BREAK;
>  11646              	.stabn 68,0,5167,.LM1555-interpret
>  11647              	.LM1555:
>  11648 2b71 FF24AD40 		jmp *jumpTable.586(,%ebp,4)
>  11648      240000
>
> or (stripping out the actual work of the bytecode) just three instructions
> total:
>
>  11639 2b65 46       		incl %esi
>  11640 2b66 0FB62E   		movzbl (%esi),%ebp
>  11648 2b71 FF24AD40 		jmp *jumpTable.586(,%ebp,4)
>  11648      240000

That was with gcc 3.0.3. Unfortunately, some of the plugins don't work right, 
so I'm a bit wary of using this compiler.

With 2.95.3, it uses a stack variable for the current bytecode, which slows it 
down from 49000000 or so to 42000000 or so bytecodes/sec (2 additional 
instructions: store and fetch from this memory location). However, things 
seem to work correctly.

So we intel-users have a choice of (at least) two options: fast and flaky, or 
slow and less flaky.

I'm going to fiddle around with 2.95.3 a bit to see if I can get those 
instructions out.

-- 
Ned Konz
currently: Stanwood, WA
email:     ned at bike-nomad.com
homepage:  http://bike-nomad.com




More information about the Squeak-dev mailing list