How to explore a Project saving problem, in great detail

Ned Konz ned at squeakland.org
Mon Jul 12 20:11:29 UTC 2004


On Monday 12 July 2004 6:42 am, Milan Zimmermann wrote:
> one note/question - when playing with your solution last night I
> thought the "reason" why the save hang was the "improperly renamed class
> ZTime" - do you think it was that or just the fact I had the Changeset
> browser opened in the project, which made the object graph too big and
> Squeak decided to stop the save?)

It's not that the object graph was too big. It's that there was a pointer 
outside the graph that couldn't get resolved properly (in this case, a 
ReadOnlyVariableBinding of #ZTime to the class Time). This was caused 
(indirectly) by your having a ChangeSorter open in the project.

> I tried to look for the "ZTime" class in Browser before I closed the
> "Change Sorter" but did not find it, not sure why. In any case, do you
> think I must have renamed it or something else could have happened to the
> image? - Just curious, I am not aware of renaming any class, perhaps by
> accident .. I have been playing with creating classes from the workspace,
> perhaps I messed up at some point.

The problem existed in a stock image. Read on...

I will explain in detail what I did to explore the problem in hopes that it 
may help some of you. This is in the "teaching people to fish" spirit.

[For those of you who want to play along at home, you may repeat this problem 
using a stock 3.7g image by making a new Morphic project, opening a 
ChangeSorter in it, and trying to save the project on disk]

- First, I noticed that the project stopped saving at some point and the world 
would respond to mouse clicks again (that is, I could raise a menu).

- I then returned to the parent project (the one I had asked to save) and saw 
an inform dialog that said "extra associations".

- I opened a Workspace, typed in the phrase "extra associations", and hit 
alt-shift-E (search for methods containing the string in a string literal). 
Actually, I could have done this easier by shift-clicking on the string in 
the informer, and hitting alt-shift-E, but I knew I'd need a Workspace 
anyway.

- I found ImageSegment>>prepareToBeSaved. Looking down at the bottom of this 
method, I saw that it was trying to (but didn't) open an inspector. So...

- In my Workspace, I typed in 
	ImageSegment allInstances
and hit alt-shift-I (explore it). I saw that there was one of them.

- I selected this one in the explorer, and typed in
	self prepareToBeSaved
in the bottom of the explorer, and hit alt-D (do it).

- Again I saw the inform dialog, but this time when I hit the OK button I saw 
the inspector pop up. It contained an IdentitySet with one element (which 
printed itself as #ZTime).

- I typed in to the bottom of the dialog
	self anyOne inspect

- This brought up an Inspector on a ReadOnlyVariableBinding whose key was 
#ZTime and whose value was Time

- selecting the key (#ZTime) in the inspector's upper left hand pane, I 
brought up the context menu using the mouse. I chose "objects pointing to 
this value"

[A much better tool for tracing pointers is Avi Bryant's Pointer Explorer, 
available using SqueakMap. I rarely have much success using the "chase 
pointers" choice (the Pointer Finder).].

- I got an inspector on an array with five slots. The last slot was an 
ObjectExplorerWrapper (a side effect of having the other explorer open). But 
the other four were interesting. They were:

	- a big array with symbols and other things in it: #(#EventPollFrequency 
#startUpWithCaption:at:allowKeyboard: #delayedBy: #anchors #panForTrack: nil 
#'read/write segment' #reciprocalFloorLog: an Object an Object #radial 
#packageNamed:ifAbsent: an Object #blockAssociationCheck: an Object an 
Object ...
	- a ClassChangeRecord
	- #ZTime->a ClassChangeRecord
	- another big array: #(Array Project nil a Project(NewChanges) false 
PasteUpMorph TranscriptStream ChangeSet Form HandMorph a StringMorph(2714)'a 
StringMorph(113)''a PasteUpMorph(1263) [world] that was not counted'' that 
was not counted' a StringMorph(3918)'a StringMorph(1420)''a 
SimpleHierarchicalListMorph(1810) that was not counted'' ...

- I guessed that the two big arrays were used by the ImageSegment, so focused 
on the ClassChangeRecords.

- Now, I knew that ClassChangeRecords lived in ChangeSets, which are pointed 
to by ChangeSorters. Having seen a ChangeSorter in your project, I suspected 
that was the problem. So I selected the association (#ZTime->a 
ClassChangeRecord) and chose from the Inspector context menu "objects 
pointing to it". I saw:

#(#Time->a ClassChangeRecord #TimeStamp->a ClassChangeRecord #ZTime->a 
ClassChangeRecord nil nil nil nil #Date->a ClassChangeRecord nil #ZDate->a 
ClassChangeRecord nil #ZTimeStamp->a ClassChangeRecord)

- "Ah, I see" I said to myself. Now I recalled that we renamed some classes in 
the Great Time And Date Revolution around CS 5660. So I did "objects pointing 
to it" another two times until I got to
	a ChangeSet named 5668Chrnl-i-CnvtTDateTTime1

- So now my suspicion that the ChangeSorter was involved was proved. I knew 
how ChangeSorters and ChangeSets were structured, but following the pointers 
further would have led to a ChangeSetCategory and/or the ChangeSorter.

- But the question about why was this one class an unresolved out pointer 
still remained.

The answer was here in ImageSegment>>prepareToBeSaved:

		(Smalltalk associationAt: assoc key ifAbsent: [3]) == assoc ifTrue: [
			outPointers at: (outIndexes at: assoc) put: 
				(DiskProxy global: #Smalltalk selector: #associationAt: 
					args: (Array with: assoc key)).

In other words, since Smalltalk didn't contain #ZTime (this was a temporary 
rename), we didn't replace the association with a DiskProxy.

In the Workspace, you can see other potential problems like this by doing a 
print-it or inspect-it on:

ReadOnlyVariableBinding allInstances
	select: [ :ea | ea value isBehavior and: [ (Smalltalk at: ea key ifAbsent: 
[]) ~~ ea value ]]
	thenCollect: [ :ea | ea key -> ea value ]

which returns:

#(#CRGestureProcessor->AnObsoleteCRGestureProcessor 
#PseudoClassOrganizer->AnObsoletePseudoClassOrganizer 
#CRDictionary->AnObsoleteCRDictionary 
#ClassOrganizer->AnObsoleteClassOrganizer #ZDate->Date 
#CRAddFeatureMorph->AnObsoleteCRAddFeatureMorph 
#CRLookupItem->AnObsoleteCRLookupItem #ZTime->Time 
#IRCProtocolMessage->AnObsoleteIRCProtocolMessage 
#IRCMorph->AnObsoleteIRCMorph #SMTPSocket->AnObsoleteSMTPSocket 
#CRDisplayProperties->AnObsoleteCRDisplayProperties)

But this begs the question of "why did it appear in the out pointers?". If you 
look at ChangeSet>>objectForDataStream: you will see that ChangeSets 
themselves don't get stored. So something else must have been responsible. 
What? Hmm...

Let's look at who might be pointing to it. We can skip ChangeRecords, since 
they won't get stored. So let's try this:


| zt ptrs baddies ptrs2 |
Smalltalk garbageCollect.
zt := (ReadOnlyVariableBinding allInstances select: [ :ea | ea key = #ZTime ]) 
first.
ptrs := Smalltalk pointersTo: zt except: { zt. zt key. zt value. }.
baddies := OrderedCollection new.
ptrs do: [ :p |
	ptrs2 := nil.
	ptrs2 := Smalltalk pointersTo: p except: { ptrs. p. thisContext. thisContext 
sender. thisContext home. thisContext blockHome. zt. zt key. zt value. 
baddies. baddies collector }.
	ptrs2 := ptrs2 reject: [ :ea |
		('*ChangeRecord*' match: ea class name) ].
	ptrs2 isEmpty ifFalse: [baddies addAll: ptrs2 ].
].
baddies asArray inspect.


Which shows me a single MethodContext, whose receiver is Delay, and whose 
source is missing. But looking around the context, I see that there is a temp 
of a Semaphore, and on its list is a process with priority 80.

Opening a Process browser, I see that that's the timer interrupt watcher. 
Apparently this didn't get recompiled and restarted, or it's the old method.

By printing the result of

	Delay class compiledMethodAt: #timerInterruptWatcher

I see that the method is different (different identityHash).

OK... let's try to fix the problem. Looks like I'll have to restart the timer 
interrupt watcher, but I don't want to kill off the existing delays.

Reading the code of Delay class shows how this might work.

Let's try:

	Delay shutDown; startTimerInterruptWatcher; startUp.

Repeating the search for baddies above shows that that one reference has been 
removed.

Trying the project storage (with a change sorter) again now works.

Was that fun, or what?

Why did I say that the class was "improperly renamed"? Notice that all of the 
other associations besides #ZTime and #ZDate refer to classes whose names 
start with "AnObsolete" and which will in fact report that they are obsolete 
when asked.

However, #ZTime and #ZDate actually point to live classes with other names!

How did this happen? Well, in the postscript of change set 
5668Chrnl-i-CnvtTDateTTime1 there is the following code:

| jdn |
jdn _ SystemVersion current date asDate julianDayNumber.

#(#Date #Time #TimeStamp) 	
	do: 	[ :s | | zname tname |
		Transcript show: (Smalltalk at: s) allInstances; cr.
		zname  _ ('Z', s) asSymbol.
		tname  _ ('T', s) asSymbol.
		(Smalltalk at: s) rename: zname.
		(Smalltalk at: tname) rename: s.
		(Smalltalk systemNavigation allCallsOn: (Smalltalk associationAt: zname))
			do: [ :ref | ref actualClass ifNotNilDo: [ :ac | ac recompile: ref 
methodSymbol ] ].
		Smalltalk forgetClass: (Smalltalk at: zname) logged: false.
		Transcript show: 'replaced ', s; cr ].

Transcript show: 'removing subclasses....'.
(Magnitude subclasses select: [ :sc | sc name beginsWith: 'Z' ]) 
	do:  [ :sc | Magnitude  removeSubclass: sc ].
Transcript tab; show: 'removed.'; cr.


That is (for instance, for #ZTime):

* There used to be a subclass of Magnitude called Time.
* we defined a new class called TTime
* we renamed Time to ZTime
* we renamed TTime to Time
* we then recompiled all the references to the old versions of Time (which was 
now named #ZTime)
* we then forgot #ZTime, and tried not to remember that we had done so (in the 
change set) 
* then we tried to remove #ZTime (again) using #removeSubclass: (this 
shouldn't have been necessary after #forgetClass:logged:, and is almost 
always the wrong thing to do from outside the guts of the system).

However, this didn't work entirely right. First, what we *should* have done 
instead of #forgetClass:logged: and #removeSubclass: was instead

	(Smalltalk at: zname) removeFromSystem: false.


Anyway, this is probably more than you wanted to know.

I'll post a change set that fixes this problem, so saving projects that 
contain change sorters will work.

Have fun,
-- 
Ned Konz
http://bike-nomad.com/squeak/



More information about the Squeak-dev mailing list