<div dir="ltr">Hi Eliot,<br><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Isn't it great when one has to work around compiler bugs?? ;-)</div></blockquote><div>This is very annoying. And for me this is not the first time. It seems that with the heavy inlining in GCC we are putting a bit too much stress on its register allocator.<br><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Isn't it great when one has to work around compiler bugs?? ;-)</div><div><br></div><div>However,
let me suggest that this this is perhaps a case where a macro would be
better. If you added the definition of the macro to StackInterpreter
class>>#preambleCCode you'd be able to avoid the overhead with
compilers that can correctly inline memcpy. If required, a sqPlatform.h
could define a value, say DontInlineMemcpyForLowcode and then in the
preamble you could have</div><div><br></div><div>#if DontInlineMemcpyForLowcode</div><div># define memcpy(a,b,c) noinline_memcpy(a,b,c)</div><div>#endif</div><div><br></div><div>?
And then noinline_memcpy could be defined in some platform support
file, sqWin32Main.c perhaps? The simulator's noinline_memcpy would be
defined as <doNotGenerate>.</div><div><br></div><div>Anyway, some way of making this platform-dependent is nice as you'll get better performance on the other platforms, and on x64.</div><div><br></div><div>And yes, feel free to ignore me as this perhaps does count as a premature optimization.</div><div class="gmail-yj6qo gmail-ajU"><div></div></div></blockquote><div> <br>I was thinking on doing something like this, but I did not knew how to do it because of Slang. Later in some time I will fix it.<br><br></div><div>Best regards,<br></div><div>Ronie<br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-01-11 16:38 GMT-03:00 Eliot Miranda <span dir="ltr"><<a href="mailto:eliot.miranda@gmail.com" target="_blank">eliot.miranda@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Ronie,<div><br></div><div> I see this :-)</div><div><br></div><div><span class=""><div>lowcode_mem: destAddress cp: sourceAddress y: bytes</div></span><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>"This method is a workaround a GCC bug.</div></span><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>In Windows memcpy is putting too much register pressure on GCC when used by Lowcode instructions"</div></span><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span><inline: #never></div><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span><option: #LowcodeVM></div><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span><var: #destAddress type: #'void*'></div></span><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span><var: #sourceAddress type: #'void*'></div></span><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span><var: #bytes type: #'sqInt'></div><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span></div></span><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>"Using memmove instead of memcpy to avoid crashing GCC in Windows."</div></span><span class=""><div><span class="m_5837494936499094584gmail-Apple-tab-span" style="white-space:pre-wrap"> </span>self mem: destAddress mo: sourceAddress ve: bytes</div></span></div><div><br></div><div>Isn't it great when one has to work around compiler bugs?? ;-)</div><div><br></div><div>However, let me suggest that this this is perhaps a case where a macro would be better. If you added the definition of the macro to StackInterpreter class>>#preambleCCode you'd be able to avoid the overhead with compilers that can correctly inline memcpy. If required, a sqPlatform.h could define a value, say DontInlineMemcpyForLowcode and then in the preamble you could have</div><div><br></div><div>#if DontInlineMemcpyForLowcode</div><div># define memcpy(a,b,c) noinline_memcpy(a,b,c)</div><div>#endif</div><div><br></div><div>? And then noinline_memcpy could be defined in some platform support file, sqWin32Main.c perhaps? The simulator's noinline_memcpy would be defined as <doNotGenerate>.</div><div><br></div><div>Anyway, some way of making this platform-dependent is nice as you'll get better performance on the other platforms, and on x64.</div><div><br></div><div>And yes, feel free to ignore me as this perhaps does count as a premature optimization.</div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On Tue, Jan 10, 2017 at 11:42 PM, <span dir="ltr"><<a href="mailto:commits@source.squeak.org" target="_blank">commits@source.squeak.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Ronie Salgado Faila uploaded a new version of VMMaker to project VM Maker:<br>
<a href="http://source.squeak.org/VMMaker/VMMaker.oscog-rsf.2083.mcz" rel="noreferrer" target="_blank">http://source.squeak.org/VMMak<wbr>er/VMMaker.oscog-rsf.2083.mcz</a><br>
<br>
==================== Summary ====================<br>
<br>
Name: VMMaker.oscog-rsf.2083<br>
Author: rsf<br>
Time: 11 January 2017, 4:42:00.330997 am<br>
UUID: 2debebfc-5008-4ab3-b16d-37ab94<wbr>2d9bc0<br>
Ancestors: VMMaker.oscog-eem.2082<br>
<br>
Workaround a GCC crash in Windows when building a Lowcode VM. Too much register allocation pressure for calling a builtin memcpy.<br>
<br>
=============== Diff against VMMaker.oscog-eem.2082 ===============<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>internalPush<wbr>ShadowCallStackStructure:size: (in category 'internal interpreter access') -----<br>
internalPushShadowCallStackStr<wbr>ucture: structurePointer size: size<br>
<option: #LowcodeVM><br>
shadowCallStackPointer := shadowCallStackPointer - size.<br>
+ self lowcode_mem: shadowCallStackPointer cp: structurePointer y: size!<br>
- self mem: shadowCallStackPointer cp: structurePointer y: size!<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>lowcodePrimi<wbr>tiveInt32ToPointer (in category 'inline primitive generated code') -----<br>
lowcodePrimitiveInt32ToPointer<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| value result |<br>
<var: #value type: #'sqInt' ><br>
<var: #result type: #'char*' ><br>
value := self internalPopStackInt32.<br>
<br>
+ result := self cCoerce: (self cCoerce: value to: 'uintptr_t') to: 'char*'.<br>
- result := self cCoerce: value to: 'uintptr_t'.<br>
<br>
self internalPushPointer: result.<br>
<br>
!<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>lowcodePrimi<wbr>tiveMemcpy32 (in category 'inline primitive generated code') -----<br>
lowcodePrimitiveMemcpy32<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| source dest size |<br>
<var: #source type: #'char*' ><br>
<var: #dest type: #'char*' ><br>
<var: #size type: #'sqInt' ><br>
size := self internalPopStackInt32.<br>
source := self internalPopStackPointer.<br>
dest := self internalPopStackPointer.<br>
<br>
+ self lowcode_mem: dest cp: source y: size.<br>
- self mem: dest cp: source y: size.<br>
<br>
<br>
!<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>lowcodePrimi<wbr>tiveMemcpy64 (in category 'inline primitive generated code') -----<br>
lowcodePrimitiveMemcpy64<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| source dest size |<br>
<var: #source type: #'char*' ><br>
<var: #dest type: #'char*' ><br>
<var: #size type: #'sqLong' ><br>
size := self internalPopStackInt64.<br>
source := self internalPopStackPointer.<br>
dest := self internalPopStackPointer.<br>
<br>
+ self lowcode_mem: dest cp: source y: size.<br>
- self mem: dest cp: source y: size.<br>
<br>
<br>
!<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>lowcodePrimi<wbr>tiveMemcpyFixed (in category 'inline primitive generated code') -----<br>
lowcodePrimitiveMemcpyFixed<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| source size dest |<br>
<var: #source type: #'char*' ><br>
<var: #dest type: #'char*' ><br>
size := extA.<br>
source := self internalPopStackPointer.<br>
dest := self internalPopStackPointer.<br>
<br>
+ self lowcode_mem: dest cp: source y: size.<br>
- self mem: dest cp: source y: size.<br>
<br>
extA := 0.<br>
<br>
!<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>lowcodePrimi<wbr>tivePerformCallStructure (in category 'inline primitive generated code') -----<br>
lowcodePrimitivePerformCallStr<wbr>ucture<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| resultPointer result function structureSize |<br>
<var: #resultPointer type: #'char*' ><br>
<var: #result type: #'char*' ><br>
function := extA.<br>
structureSize := extB.<br>
result := self internalPopStackPointer.<br>
<br>
self internalPushShadowCallStackPoi<wbr>nter: result.<br>
resultPointer := self lowcodeCalloutPointerResult: (self cCoerce: function to: #'char*').<br>
<br>
self internalPushPointer: resultPointer.<br>
extA := 0.<br>
extB := 0.<br>
numExtB := 0.<br>
+<br>
!<br>
<br>
Item was changed:<br>
----- Method: StackInterpreter>>lowcodePrimi<wbr>tivePointerAddConstantOffset (in category 'inline primitive generated code') -----<br>
lowcodePrimitivePointerAddCons<wbr>tantOffset<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| base offset result |<br>
<var: #base type: #'char*' ><br>
<var: #result type: #'char*' ><br>
offset := extB.<br>
base := self internalPopStackPointer.<br>
<br>
result := base + offset.<br>
<br>
self internalPushPointer: result.<br>
extB := 0.<br>
numExtB := 0.<br>
<br>
!<br>
<br>
Item was added:<br>
+ ----- Method: StackInterpreter>>lowcode_mem:<wbr>cp:y: (in category 'inline primitive support') -----<br>
+ lowcode_mem: destAddress cp: sourceAddress y: bytes<br>
+ "This method is a workaround a GCC bug.<br>
+ In Windows memcpy is putting too much register pressure on GCC when used by Lowcode instructions"<br>
+ <inline: #never><br>
+ <option: #LowcodeVM><br>
+ <var: #destAddress type: #'void*'><br>
+ <var: #sourceAddress type: #'void*'><br>
+ <var: #bytes type: #'sqInt'><br>
+<br>
+ "Using memmove instead of memcpy to avoid crashing GCC in Windows."<br>
+ self mem: destAddress mo: sourceAddress ve: bytes!<br>
<br>
Item was changed:<br>
----- Method: StackToRegisterMappingCogit>>g<wbr>enLowcodePerformCallStructure (in category 'inline primitive generators generated code') -----<br>
genLowcodePerformCallStructure<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
<br>
"Push the result space"<br>
self ssNativeTop nativeStackPopToReg: TempReg.<br>
self ssNativePop: 1.<br>
self PushR: TempReg.<br>
"Call the function"<br>
self callSwitchToCStack.<br>
self MoveCw: extA R: TempReg.<br>
self CallRT: ceFFICalloutTrampoline.<br>
"Fetch the result"<br>
self MoveR: backEnd cResultRegister R: ReceiverResultReg.<br>
self ssPushNativeRegister: ReceiverResultReg.<br>
extA := 0.<br>
extB := 0.<br>
numExtB := 0.<br>
<br>
^ 0<br>
<br>
!<br>
<br>
Item was changed:<br>
----- Method: StackToRegisterMappingCogit>>g<wbr>enLowcodePointerAddConstantOff<wbr>set (in category 'inline primitive generators generated code') -----<br>
genLowcodePointerAddConstantOf<wbr>fset<br>
<option: #LowcodeVM> "Lowcode instruction generator"<br>
| base offset |<br>
offset := extB.<br>
<br>
(base := backEnd availableRegisterOrNoneFor: self liveRegisters) = NoReg ifTrue:<br>
[self ssAllocateRequiredReg:<br>
(base := optStatus isReceiverResultRegLive<br>
ifTrue: [Arg0Reg]<br>
ifFalse: [ReceiverResultReg])].<br>
base = ReceiverResultReg ifTrue:<br>
[ optStatus isReceiverResultRegLive: false ].<br>
self ssNativeTop nativePopToReg: base.<br>
self ssNativePop: 1.<br>
<br>
self AddCq: offset R: base.<br>
self ssPushNativeRegister: base.<br>
<br>
extB := 0.<br>
numExtB := 0.<br>
^ 0<br>
<br>
!<br>
<br>
</blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div class="m_5837494936499094584gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>
</font></span></div>
</blockquote></div><br></div>