Well,
So some god words came and now I'm looking at the assembly code...
Bottom line is: gcc 2.95.2 on my Linux makes the bytecode/sec count larger, but makes send/sec count smaller.
gcc 2.95.2 on Windows generates a code sequence for two bytecodes like this:
------------------- 69d8: 46 inc %esi 69d9: 0f b6 1e movzbl (%esi),%ebx 69dc: 83 c7 04 add $0x4,%edi 69df: a1 00 00 00 00 mov 0x0,%eax 69e4: 8b 40 08 mov 0x8(%eax),%eax 69e7: 89 07 mov %eax,(%edi) 69e9: ff 24 9d 80 27 00 00 jmp *0x2780(,%ebx,4) 69f0: 46 inc %esi 69f1: 0f b6 1e movzbl (%esi),%ebx 69f4: 83 c7 04 add $0x4,%edi 69f7: a1 00 00 00 00 mov 0x0,%eax 69fc: 8b 40 0c mov 0xc(%eax),%eax 69ff: 89 07 mov %eax,(%edi) 6a01: ff 24 9d 80 27 00 00 jmp *0x2780(,%ebx,4) -------------------
Apparently, %esi is used (exclusively) for IP, and %ebx keeps the next byte, and "jmp *" takes you to the next location stored in the table starts at 0x2780.
gcc 4.1.2 on Fedora Core 7 generates a code sequence for two bytecodes like this:
------------------- efcf: 8d 46 01 lea 0x1(%esi),%eax efd2: 0f b6 08 movzbl (%eax),%ecx efd5: 89 c6 mov %eax,%esi efd7: a1 40 00 00 00 mov 0x40,%eax efdc: 8d 57 04 lea 0x4(%edi),%edx efdf: 89 d7 mov %edx,%edi efe1: 89 cb mov %ecx,%ebx efe3: 8b 40 2c mov 0x2c(%eax),%eax efe6: 89 02 mov %eax,(%edx) efe8: 8b 04 8d 20 04 00 00 mov 0x420(,%ecx,4),%eax efef: ff e0 jmp *%eax eff1: 8d 46 01 lea 0x1(%esi),%eax eff4: 0f b6 08 movzbl (%eax),%ecx eff7: 89 c6 mov %eax,%esi eff9: a1 40 00 00 00 mov 0x40,%eax effe: 8d 57 04 lea 0x4(%edi),%edx f001: 89 d7 mov %edx,%edi f003: 89 cb mov %ecx,%ebx f005: 8b 40 30 mov 0x30(%eax),%eax f008: 89 02 mov %eax,(%edx) f00a: 8b 04 8d 20 04 00 00 mov 0x420(,%ecx,4),%eax f011: ff e0 jmp *%eax -------------------
%esi is almost used for IP but use %eax for fetching the next byte, jmp also seems to use %eax so right before it is spilled and the destination address is brought into %eax.
I'd be surprized that this is optimized for a specific x86 variation. I copy the command line option from Windows Makefile to Fedora:
-mpentium -mwindows -Werror-implicit-function-declaration -fomit-frame-pointer -funroll-loops -fschedule-insns2
and got equally unsatisfying (slightly different) sequence.
Ok, so one thing to try is to install gcc 2.95.2 to Fedora Core 7 and compile the interpreter with it. The resulting assembly code is close to the one on Windows. The bytecode/sec count went put but send/sec went down. I have a feeling that I saw it before but of course cannot remember the exact condition...
If somebody has dual boot machine and can compare 8 (or more) cases (Namely, the combination of Windows/Linux, 2.95.2/4.1.2, more options/less options), that would be great.
-- Yoshiki