[Vm-dev] Re: [squeak-dev] VM performance discrepancy on Linux and Windows

Yoshiki Ohshima yoshiki at vpri.org
Fri Apr 11 07:13:54 UTC 2008


  Well,

  So some god words came and now I'm looking at the assembly code...

Bottom line is: gcc 2.95.2 on my Linux makes the bytecode/sec count
larger, but makes send/sec count smaller.

gcc 2.95.2 on Windows generates a code sequence for two bytecodes like this:

-------------------
    69d8:	46                   	inc    %esi
    69d9:	0f b6 1e             	movzbl (%esi),%ebx
    69dc:	83 c7 04             	add    $0x4,%edi
    69df:	a1 00 00 00 00       	mov    0x0,%eax
    69e4:	8b 40 08             	mov    0x8(%eax),%eax
    69e7:	89 07                	mov    %eax,(%edi)
    69e9:	ff 24 9d 80 27 00 00 	jmp    *0x2780(,%ebx,4)
    69f0:	46                   	inc    %esi
    69f1:	0f b6 1e             	movzbl (%esi),%ebx
    69f4:	83 c7 04             	add    $0x4,%edi
    69f7:	a1 00 00 00 00       	mov    0x0,%eax
    69fc:	8b 40 0c             	mov    0xc(%eax),%eax
    69ff:	89 07                	mov    %eax,(%edi)
    6a01:	ff 24 9d 80 27 00 00 	jmp    *0x2780(,%ebx,4)
-------------------

Apparently, %esi is used (exclusively) for IP, and %ebx keeps the next
byte, and "jmp *" takes you to the next location stored in the table
starts at 0x2780.

gcc 4.1.2 on Fedora Core 7 generates a code sequence for two bytecodes like this:

-------------------
    efcf:	8d 46 01             	lea    0x1(%esi),%eax
    efd2:	0f b6 08             	movzbl (%eax),%ecx
    efd5:	89 c6                	mov    %eax,%esi
    efd7:	a1 40 00 00 00       	mov    0x40,%eax
    efdc:	8d 57 04             	lea    0x4(%edi),%edx
    efdf:	89 d7                	mov    %edx,%edi
    efe1:	89 cb                	mov    %ecx,%ebx
    efe3:	8b 40 2c             	mov    0x2c(%eax),%eax
    efe6:	89 02                	mov    %eax,(%edx)
    efe8:	8b 04 8d 20 04 00 00 	mov    0x420(,%ecx,4),%eax
    efef:	ff e0                	jmp    *%eax
    eff1:	8d 46 01             	lea    0x1(%esi),%eax
    eff4:	0f b6 08             	movzbl (%eax),%ecx
    eff7:	89 c6                	mov    %eax,%esi
    eff9:	a1 40 00 00 00       	mov    0x40,%eax
    effe:	8d 57 04             	lea    0x4(%edi),%edx
    f001:	89 d7                	mov    %edx,%edi
    f003:	89 cb                	mov    %ecx,%ebx
    f005:	8b 40 30             	mov    0x30(%eax),%eax
    f008:	89 02                	mov    %eax,(%edx)
    f00a:	8b 04 8d 20 04 00 00 	mov    0x420(,%ecx,4),%eax
    f011:	ff e0                	jmp    *%eax
-------------------

%esi is almost used for IP but use %eax for fetching the next byte,
jmp also seems to use %eax so right before it is spilled and the
destination address is brought into %eax.

  I'd be surprized that this is optimized for a specific x86
variation.  I copy the command line option from Windows Makefile to
Fedora:

-mpentium -mwindows -Werror-implicit-function-declaration -fomit-frame-pointer -funroll-loops -fschedule-insns2

and got equally unsatisfying (slightly different) sequence.

Ok, so one thing to try is to install gcc 2.95.2 to Fedora Core 7 and
compile the interpreter with it.  The resulting assembly code is close
to the one on Windows.  The bytecode/sec count went put but send/sec
went down.  I have a feeling that I saw it before but of course cannot
remember the exact condition...

  If somebody has dual boot machine and can compare 8 (or more) cases
(Namely, the combination of Windows/Linux, 2.95.2/4.1.2, more
options/less options), that would be great.

-- Yoshiki


More information about the Vm-dev mailing list