[Vm-dev] About Cog on linux

Igor Stasenko siguctua at gmail.com
Fri Feb 11 09:47:52 UTC 2011


On 11 February 2011 02:26, Igor Stasenko <siguctua at gmail.com> wrote:
> On 11 February 2011 01:12, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>
>>
>>
>> On Thu, Feb 10, 2011 at 11:47 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>>>
>>> On 10 February 2011 19:38, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Feb 10, 2011 at 9:55 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>>> >>
>>> >> On 10 February 2011 18:34, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>> >> >
>>> >> > Hi Igor,
>>> >> >
>>> >> > On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>>> >> >>
>>> >> >> On 9 February 2011 20:04, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >> I tried today with debug info enabled (all source files are compiled with:
>>> >> >>
>>> >> >> compilerFlags
>>> >> >>
>>> >> >>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>>> >> >>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>>> >> >>
>>> >> >> )
>>> >> >>
>>> >> >
>>> >> > please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>>> >> >
>>> >> >\
>>> >>
>>> >> ok i made such config:
>>> >>
>>> >> ....results/Cog -version
>>> >> 3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
>>> >> Croquet Closure Stack VM [StackInterpreter
>>> >> VMMaker-oscog-IgorStasenko.Stasenko.49]
>>> >> <FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
>>> >> plugin path: /home/sig/vmbuild/build/results/ [default:
>>> >> /home/sig/vmbuild/build/results/]
>>> >>
>>> >>
>>> >> it also produces a lot of .i and .s files around build dir..
>>> >
>>> > Right.  That's what -save-temps does.  And that's useful data, especially in seeing what code gcc produces for different -O levels.
>>> >
>>> >>
>>> >> You can try building it by loading CMakeVMMaker-IgorStasenko.27
>>> >> package and doing:
>>> >>
>>> >> FixedVerSIDebugUnixConfig generateWithSources
>>> >>
>>> >> or tell me what to do next :)
>>> >
>>> > You need to try and produce one build that crashes and one build that doesn't (based e.g. on -O level).  When you have that compare the two up to the point of failure.
>>>
>>> i can do that , but on two different architectures.
>>
>> I mean of course two different compilations on the same architecture of the same source, one that crashes and one that doesn't.
>>
>>>
>>> On linux it crashing , no matter what i do (as you can see even stack
>>> based are crashing).
>>
>> Sometimes my linux builds work and sometimes they don't, and I see no rhyme or reason why.  Thats what we're trying to work out.  So we need to look at linux, and compare builds that work against those that don't.  So the first requirement is to obtain reproducible builds that work and that don't.
>
> well, so far i have 100% reproducible crash. All combinations:
> JIT/Stack release/debug :)
> I built VM using this config on two different linux system - one is
> ubuntu on my virtual box, and another is ?centOS?
> on machine which used as a Hudson slave.
>
> i will check tomorrow a build flags for interp.c file.. i think it's
> built using same optimization flag(s) - O1 , because
> it is set separately.
>

Ah, no.. the flags were set for cogit.c

set_source_files_properties( ${srcVMDir}/cogit.c PROPERTIES
                COMPILE_FLAGS "-O1 -fno-omit-frame-pointer
-momit-leaf-frame-pointer -mno-rtd -mno-accumulate-outgoing-args")

but apparently if i build stack-based VM, they are not used for gcc3x-interp.c.

Here the command line used to compile it:

/usr/bin/gcc  -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
-DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1
-I/home/sig/cog-blessed/platforms/unix/plugins/B3DAcceleratorPlugin
-I/home/sig/cog-blessed/platforms/Cross/vm
-I/home/sig/cog-blessed/src/vm
-I/home/sig/cog-blessed/platforms/unix/vm
-I/home/sig/cog-blessed/build   -g3 -O1 -msse2 -save-temps -o
CMakeFiles/Cog.dir/home/sig/cog-blessed/src/vm/gcc3x-interp.c.o   -c


> i will also try to build VM using purely your sources to avoid
> possible impact of my changes. But i checked that before , without
> much difference.
>

Confirmed: i built VM using purely your sources, not tainted by my
changes. And this didn't changed anything:

cd build
cmake . && make
cd results

sig at sig-VirtualBox:~/cog-blessed/build/results$ ./Cog -version
3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
Croquet Closure Stack VM [StackInterpreter VMMaker-oscog.47]
<FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
plugin path: /home/sig/cog-blessed/build/results/ [default:
/home/sig/cog-blessed/build/results/]

sig at sig-VirtualBox:~/cog-blessed/build/results$ ./Cog
../../image/VMMaker-Squeak4.1.image
.....
Segmentation fault

C stack backtrace:
./Cog(error+0x50)[0x80836b5]
./Cog[0x8083753]
[0x327400]
./Cog[0x8078326]
./Cog(interpret+0x1a)[0x8072ee9]
./Cog(main+0x408)[0x8083624]
/lib/libc.so.6(__libc_start_main+0xe7)[0x82bce7]
./Cog[0x805cca1]


Smalltalk stack dump:
0xbf9fd524 [] in ByteString(Object)>doesNotUnderstand: 2016719100:
a(n) ByteString
0xbf9fd54c [] in SmalltalkImage>snapshot:andQuit:embedded: 2006102020:
a(n) SmalltalkImage
2016951156 s SmalltalkImage>snapshot:andQuit:


As last measure i also tried to use non-gnuified interp.c instead of
gcc3x-interp.c.. Same result.


Smalltalk stack dump:
0xbfe8de34 [] in ByteString(Object)>doesNotUnderstand: 2017259772:
a(n) ByteString
0xbfe8de5c [] in SmalltalkImage>snapshot:andQuit:embedded: 2006642692:
a(n) SmalltalkImage
2017491828 s SmalltalkImage>snapshot:andQuit:
2017491920 s TheWorldMenu>saveAndQuit



So, i have a strong feeling that this is not related to compiler
peculiarities, but some bug in the code.
I will try to debug it a little and see if i can find something.


-- 
Best regards,
Igor Stasenko AKA sig.


More information about the Vm-dev mailing list