[Vm-dev] Re: [squeak-dev] VM performance discrepancy on Linux
and Windows
Yoshiki Ohshima
yoshiki at vpri.org
Fri Apr 11 07:13:54 UTC 2008
Well,
So some god words came and now I'm looking at the assembly code...
Bottom line is: gcc 2.95.2 on my Linux makes the bytecode/sec count
larger, but makes send/sec count smaller.
gcc 2.95.2 on Windows generates a code sequence for two bytecodes like this:
-------------------
69d8: 46 inc %esi
69d9: 0f b6 1e movzbl (%esi),%ebx
69dc: 83 c7 04 add $0x4,%edi
69df: a1 00 00 00 00 mov 0x0,%eax
69e4: 8b 40 08 mov 0x8(%eax),%eax
69e7: 89 07 mov %eax,(%edi)
69e9: ff 24 9d 80 27 00 00 jmp *0x2780(,%ebx,4)
69f0: 46 inc %esi
69f1: 0f b6 1e movzbl (%esi),%ebx
69f4: 83 c7 04 add $0x4,%edi
69f7: a1 00 00 00 00 mov 0x0,%eax
69fc: 8b 40 0c mov 0xc(%eax),%eax
69ff: 89 07 mov %eax,(%edi)
6a01: ff 24 9d 80 27 00 00 jmp *0x2780(,%ebx,4)
-------------------
Apparently, %esi is used (exclusively) for IP, and %ebx keeps the next
byte, and "jmp *" takes you to the next location stored in the table
starts at 0x2780.
gcc 4.1.2 on Fedora Core 7 generates a code sequence for two bytecodes like this:
-------------------
efcf: 8d 46 01 lea 0x1(%esi),%eax
efd2: 0f b6 08 movzbl (%eax),%ecx
efd5: 89 c6 mov %eax,%esi
efd7: a1 40 00 00 00 mov 0x40,%eax
efdc: 8d 57 04 lea 0x4(%edi),%edx
efdf: 89 d7 mov %edx,%edi
efe1: 89 cb mov %ecx,%ebx
efe3: 8b 40 2c mov 0x2c(%eax),%eax
efe6: 89 02 mov %eax,(%edx)
efe8: 8b 04 8d 20 04 00 00 mov 0x420(,%ecx,4),%eax
efef: ff e0 jmp *%eax
eff1: 8d 46 01 lea 0x1(%esi),%eax
eff4: 0f b6 08 movzbl (%eax),%ecx
eff7: 89 c6 mov %eax,%esi
eff9: a1 40 00 00 00 mov 0x40,%eax
effe: 8d 57 04 lea 0x4(%edi),%edx
f001: 89 d7 mov %edx,%edi
f003: 89 cb mov %ecx,%ebx
f005: 8b 40 30 mov 0x30(%eax),%eax
f008: 89 02 mov %eax,(%edx)
f00a: 8b 04 8d 20 04 00 00 mov 0x420(,%ecx,4),%eax
f011: ff e0 jmp *%eax
-------------------
%esi is almost used for IP but use %eax for fetching the next byte,
jmp also seems to use %eax so right before it is spilled and the
destination address is brought into %eax.
I'd be surprized that this is optimized for a specific x86
variation. I copy the command line option from Windows Makefile to
Fedora:
-mpentium -mwindows -Werror-implicit-function-declaration -fomit-frame-pointer -funroll-loops -fschedule-insns2
and got equally unsatisfying (slightly different) sequence.
Ok, so one thing to try is to install gcc 2.95.2 to Fedora Core 7 and
compile the interpreter with it. The resulting assembly code is close
to the one on Windows. The bytecode/sec count went put but send/sec
went down. I have a feeling that I saw it before but of course cannot
remember the exact condition...
If somebody has dual boot machine and can compare 8 (or more) cases
(Namely, the combination of Windows/Linux, 2.95.2/4.1.2, more
options/less options), that would be great.
-- Yoshiki
More information about the Squeak-dev
mailing list
|