Re: [Vm-dev] Re: [squeak-dev] VM performance discrepancy on Linux and Windows

11 Apr 2008


      Well,
So some god words came and now I'm looking at the assembly code...
Bottom line is: gcc 2.95.2 on my Linux makes the bytecode/sec count
larger, but makes send/sec count smaller.
gcc 2.95.2 on Windows generates a code sequence for two bytecodes like this:
-------------------
    69d8:	46                   	inc    %esi
    69d9:	0f b6 1e             	movzbl (%esi),%ebx
    69dc:	83 c7 04             	add    $0x4,%edi
    69df:	a1 00 00 00 00       	mov    0x0,%eax
    69e4:	8b 40 08             	mov    0x8(%eax),%eax
    69e7:	89 07                	mov    %eax,(%edi)
    69e9:	ff 24 9d 80 27 00 00 	jmp    *0x2780(,%ebx,4)
    69f0:	46                   	inc    %esi
    69f1:	0f b6 1e             	movzbl (%esi),%ebx
    69f4:	83 c7 04             	add    $0x4,%edi
    69f7:	a1 00 00 00 00       	mov    0x0,%eax
    69fc:	8b 40 0c             	mov    0xc(%eax),%eax
    69ff:	89 07                	mov    %eax,(%edi)
    6a01:	ff 24 9d 80 27 00 00 	jmp    *0x2780(,%ebx,4)
-------------------
Apparently, %esi is used (exclusively) for IP, and %ebx keeps the next
byte, and "jmp *" takes you to the next location stored in the table
starts at 0x2780.
gcc 4.1.2 on Fedora Core 7 generates a code sequence for two bytecodes like this:
-------------------
    efcf:	8d 46 01             	lea    0x1(%esi),%eax
    efd2:	0f b6 08             	movzbl (%eax),%ecx
    efd5:	89 c6                	mov    %eax,%esi
    efd7:	a1 40 00 00 00       	mov    0x40,%eax
    efdc:	8d 57 04             	lea    0x4(%edi),%edx
    efdf:	89 d7                	mov    %edx,%edi
    efe1:	89 cb                	mov    %ecx,%ebx
    efe3:	8b 40 2c             	mov    0x2c(%eax),%eax
    efe6:	89 02                	mov    %eax,(%edx)
    efe8:	8b 04 8d 20 04 00 00 	mov    0x420(,%ecx,4),%eax
    efef:	ff e0                	jmp    *%eax
    eff1:	8d 46 01             	lea    0x1(%esi),%eax
    eff4:	0f b6 08             	movzbl (%eax),%ecx
    eff7:	89 c6                	mov    %eax,%esi
    eff9:	a1 40 00 00 00       	mov    0x40,%eax
    effe:	8d 57 04             	lea    0x4(%edi),%edx
    f001:	89 d7                	mov    %edx,%edi
    f003:	89 cb                	mov    %ecx,%ebx
    f005:	8b 40 30             	mov    0x30(%eax),%eax
    f008:	89 02                	mov    %eax,(%edx)
    f00a:	8b 04 8d 20 04 00 00 	mov    0x420(,%ecx,4),%eax
    f011:	ff e0                	jmp    *%eax
-------------------
%esi is almost used for IP but use %eax for fetching the next byte,
jmp also seems to use %eax so right before it is spilled and the
destination address is brought into %eax.
I'd be surprized that this is optimized for a specific x86
variation.  I copy the command line option from Windows Makefile to
Fedora:
-mpentium -mwindows -Werror-implicit-function-declaration -fomit-frame-pointer -funroll-loops -fschedule-insns2
and got equally unsatisfying (slightly different) sequence.
Ok, so one thing to try is to install gcc 2.95.2 to Fedora Core 7 and
compile the interpreter with it.  The resulting assembly code is close
to the one on Windows.  The bytecode/sec count went put but send/sec
went down.  I have a feeling that I saw it before but of course cannot
remember the exact condition...
If somebody has dual boot machine and can compare 8 (or more) cases
(Namely, the combination of Windows/Linux, 2.95.2/4.1.2, more
options/less options), that would be great.
-- Yoshiki

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] Re: [squeak-dev] VM performance discrepancy on Linux and Windows