[squeak-dev] linux build stability

Eliot Miranda eliot.miranda at gmail.com
Wed Feb 2 06:17:35 UTC 2011


Hi All,

     you may already know that there have been strange stability problems
with the Cog VM on linux.  Problems with the heartbeat appear to derive from
specific compilations, one compilation of the same source producing an
executable that will crash, another producing one that won't.  recent
testing at Teleplace showed that an effect due to what was presumed to be a
compiler bug (specifically the optimization level used to compile the
heartbeat, high causing a crash) was not repeatable.  So today in building
new production VMs for Teleplace I decided to do three parallel linux builds
and see if all produced the same results.  While there are macros used in
the source that are date dependent (use of __DATE__) AFAIA there are none
apart from version.c/version.o that depend on time, and no timestamps or
current directory paths in linux objects, and so, provided different
compilations of the same source are done on the same day, the results should
be bit-identical.  In my experiment this turns out not to be the case, which
is more than a little alarming.

What I'm seeing is different results duplicating unixbuild/bld to
unixbuild/bldb and unixbuild/bldc, doing identical configures and makes in
each of the three directories and then comparing resulting objects.  I see
this in a bare metal laptop with local sources running CERN SLC5 and on a
Parallels VM running CentOS 5.3 (both derived from RHEL).  I'm using gcc
4.1.2.  Here's a script that shows example differences:

bld$ for f in *.o vm/*.o; do echo $f;cmp $f ../bldb/$f; cmp $f ../bldc/$f;
done
disabledPlugins.o
disabledPlugins.o ../bldb/disabledPlugins.o differ: byte 200, line 4
disabledPlugins.o ../bldc/disabledPlugins.o differ: byte 200, line 4
version.o
version.o ../bldb/version.o differ: byte 166, line 3
version.o ../bldc/version.o differ: byte 166, line 3
vm/aio.o
vm/cogit.o
vm/debug.o
vm/gcc3x-cointerp.o
vm/osExports.o
vm/sqExternalSemaphores.o
vm/sqHeapMap.o
vm/sqLinuxHeartbeat.o
vm/sqLinuxWatchdog.o
vm/sqLinuxWatchdog.o ../bldb/vm/sqLinuxWatchdog.o differ: byte 33, line 1
vm/sqLinuxWatchdog.o ../bldc/vm/sqLinuxWatchdog.o differ: byte 33, line 1
vm/sqNamedPrims.o
vm/sqNamedPrims.o ../bldb/vm/sqNamedPrims.o differ: byte 6346, line 30
vm/sqNamedPrims.o ../bldc/vm/sqNamedPrims.o differ: byte 6346, line 30
vm/sqTicker.o
vm/sqUnixCharConv.o
vm/sqUnixExternalPrims.o
vm/sqUnixMain.o
vm/sqUnixMain.o ../bldb/vm/sqUnixMain.o differ: byte 31415, line 170
vm/sqUnixMain.o ../bldc/vm/sqUnixMain.o differ: byte 31414, line 170
vm/sqUnixMemory.o
vm/sqUnixThreads.o
vm/sqUnixVMProfile.o
vm/sqVirtualMachine.o

Using objdump --disassemble I can see for example that sqLinuxWatchdog.o and
sqUnixMain.o differ only in the symbol table, not the executable code.  So
perhaps this is not meaningful, and merely noise.  But with simple files
like disabledPlugins.c that different objects are produced at all in
different runs is rather worrying:

bld$ cat disabledPlugins.c
/* this should be in a header file, but it isn't.  ho hum. */
typedef struct {
  char *pluginName;
  char *primitiveName;
  void *primitiveAddress;
} sqExport;
sqExport vm_display_Quartz_exports[] = { 0, 0, 0 };
sqExport vm_display_custom_exports[] = { 0, 0, 0 };
sqExport vm_display_fbdev_exports[] = { 0, 0, 0 };
sqExport vm_sound_MacOSX_exports[] = { 0, 0, 0 };
sqExport vm_sound_NAS_exports[] = { 0, 0, 0 };
sqExport vm_sound_OSS_exports[] = { 0, 0, 0 };
sqExport vm_sound_Sun_exports[] = { 0, 0, 0 };
sqExport vm_sound_custom_exports[] = { 0, 0, 0 };


I wonder
- do you see the same effect?
- does this happen with gcc versions other than 4.1.2?
- does it happen on non-RHEL-derived distros?
- is this a meaningful signal or just harmless noise?
- what am I doing wrong?

Clearly I need to look more carefully but I thought I'd ask y'all in order
to understand and hopefully solve the build instabilities as swiftly as
possible.

If you do want to try and reproduce this simply duplicate the build
directory (unixbuild/bld in the Cog VM source) twice and do three separate
configures and makes, one in each of the build directories, each from the
same source code.  Then run some variation fo the script above to compare
the object files so produced.

best
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20110201/aade71ff/attachment.htm


More information about the Squeak-dev mailing list