Switching to use foo struct on Windows VM
siguctua at gmail.com
Sun Jul 15 21:16:32 UTC 2007
On 15/07/07, John M McIntosh <johnmci at smalltalkconsulting.com> wrote:
> On Jul 15, 2007, at 11:55 AM, sig wrote:
> > Everywhere when some method uses foo struct, generator places
> > following line in function:
> > register struct foo * foo = &fum;
> I believe we only generate that if the foo structure was used in the
> routine more than once.
> On powerpc this was a clue that the structure pointer should be in a
> register which gain us some performance
> in earlier versions of GC. In later GCC compilers it seems they
> ignore the register hint now. I once tried to use
> the GCC global register hint, which worked quite well, but was
> fraught with issues if all the plugins were not
> recompiled and if foo was not setup before anyone invoked a interp.c
> routine as part of VM setup.
> > and then uses everywhere foo->bar.
> > So, the difference in compiled code when using foo struct or not is
> > minimal:
> > mov reg, [bar] <- using globals
> > mov reg, [foo + bar_offset] <- with foo
> > Of course, this depends how well GCC optimizes code, but in optimal
> > case - difference between loading value using direct pointer or using
> > base+offset is a just few cycles. And i don't think that this may
> > cause a major speed degradation.
> A cycle here, a cycle there, add up to real cycles.
> This is the first byte code in intel assembler properly optimized.
> addl $1, %esi
> movzbl (%esi), %ebx
> addl $4, %edi
> movl _foo, %eax
> movl 84(%eax), %eax
> movl 4(%eax), %eax
> movl %eax, (%edi)
> movl 512(%esp,%ebx,4), %eax
> jmp *%eax
> less than optimal compiles can result in 12 instructions, 9 versus
> 12 instructions does equal a difference in real physical time.
While you, people, fighting with different GCC compilers to force them
produce optimal code, my intent is to PROVIDE this optimal code
written by hands and compiled by Exupery. And in my case, if things go
well, example above will prove nothing, because i will be able to
reimplement any VM function (even interpret() ) and have much better
control on how to avoid producing extra jumps/calls.
More information about the Squeak-dev