Hi Guille,

On Wed, May 18, 2016 at 12:46 AM, Guille Polito <guillermopolito@gmail.com> wrote:

Hello,

-------- Original Message --------

Hi Guille, Hi Pablo (and welcome),

On Tue, May 17, 2016 at 8:37 AM, Guille Polito <guillermopolito@gmail.com> wrote:

Hi Eliot, list

I'm working here with Pablo (Tesone) on moving forward the Ephemeron implementation.

Where's "here"? Are you in Lille?

We first installed Eliot's changeset, added a #mourn method and an EphemeronDictionary collection, and then started testing something like this:

f := ObjectFinalizer receiver: 'Hello' selector: #logCr.
d := EphemeronDictionary new.

d at: f put: f.

f := nil.
Smalltalk garbageCollect.

So this looks like something simulate able. Are you able to use the simulator? If not, why not?

For some reason I have that bytesToShift when opening the image is negative.

That is to be expected. In the real VM the heap is located somewhere well above the bottom of the address space, typically above the program code. In the simulator the heap is located either at 0 (an interpreter or stack VM) or immediately above the code zone (in a Cogit VM). So when an image that has been saved on the real VM is loaded into the simulator all oops have to be adjusted down and hence bytesToShift is negative.

bytesToShift := objectMemory memoryBaseForImageRead - oldBaseAddr. "adjust pointers for zero base address"

So I cannot continue loading because addresses become negative and I have "Improper Store into indexable object kind of errors".

Can you post a back trace? Where is this happening? What version of VMMaker.oscog are you using? Are you running in Pharo or Squeak? If you're in Lille you could perhaps visit Clément's office and get him to take a look. Clément, would that be ok?

When debugging the VM there are two main levels of support, one is the simulator where there is maximum support for debugging:

- asserts on all the time

- arbitrary breakpoints

- attempting every GC in a copy of the heap before doing the real GC so that bugs in the GC can be investigated without needing to construct a reproducible case after a crash

- the Smalltalk environment to inspect and browse

The next level is the assert and debug VMs. If you look in the build directories on the Cog svn branch you'll see that all of them build three VMs, a production VM with maximum optimisation and asserts excluded, an assert VM with -O1 and asserts enabled, and a debug VM with -O0 and asserts enabled. So if you either don't see the bug in the simulator, or the simulator is too slow for the case being examined, or if the bug doesn't show up in the simulator (the worst of all worlds), build both assert and debug VMs and run with the assert VM first.

Well so far we were using a VM compiled for debug with a graphical C debugger. It was not so bad. However, I cannot say I'm missing a better debugger.

"Compiled for debug" is vague. Do you mean it is compiled with -g -O0, or in addition is compiled with -g -O0 -DDEBUGVM=0 -DNDEBUG=1?

Note that there is a heap leak checker which can be enabled both in the simulator and the assert and debug VMs. See the checkForLeaks method and the -leakcheck argument.

ok!

Without the simulator or the assert and debug VMs you are flying blind. It is /really/ productive to use the simulator for debugging, provided the bug is reproducible within a short amount of time, as for example your case is above.

Ok, gotcha! By this afternoon I'll have some news probably.

Thanks a lot!

However, as soon as we garbage collect twice, we have a VM crash. We started debugging the VM to see if we could have some more clues.

The first thing we noticed is that the first time the GC runs, the mournQueue is nil. This is of course expected because the new finalization mechanism was not active and then there was no need to create the mournQueue. We saw that the mournQueue is actually created in a lazy fashion when putting queuing a mourned object (I refer myself to #queueMourner: and #ensureRoomOnObjStackAt:). So the second time the GC passes, the mournQueue is there. So far ok, but still crashing.

The crash happens in the call to

markAndTraceObjStackandContents(GIV(mournQueue), 1);

after the

if (!markAndTraceContents) {
return;
}

But when understanding why, it starts being less clear to us :). We used the printObjStack() function and we saw that:

call printObjStack(markStack)
call printObjStack(weaklingStack)

and we saw in the console some output that makes sense. However, printing the mournQueue in the same manner produces some strange output

call printObjStack(mournQueue)

head 0xb06e980 cx 18 (18) fmt 10 (10) sz 4092 (4092) myx: 4098 (4098) unmkd
topx: 14 next: 0x0 free: 0x0

We noticed that free and next are 0x0 while the others are not...

Finally we saw there is isValidObjStack(), that gave us the following results:

call isValidObjStack(markStack) => 1

call isValidObjStack(weaklingStack) => 0
p objStackInvalidBecause = "marking but page is unmarked"

call isValidObjStack(mournQueue) => 0
p objStackInvalidBecause = "marking but page is unmarked"

So we assume that the stack creation is wrong? We are a bit lost in here.

Guille and Pablo

--

_,,,^..^,,,_

best, Eliot

_,,,^..^,,,_

best, Eliot