[Newbies] Re: Deleting elements from a Collection while looping through it

nicolas cellier nicolas.cellier.aka.nice at gmail.com
Sat Dec 10 20:50:41 UTC 2011


Jan Teske <jan.teske <at> student.hpi.uni-potsdam.de> writes:

> 
> Hey,
> 
> I have a question concerning the behavior of Collections in Squeak. 
> Recently I face a problem when trying to delete some elements from a set 
> during a 'do:' message.
> Have a look at this sample method:
> 
>      doFallForAllIn: aSet
>          aSet do:
>              [:block | block fall ifFalse: [aSet remove: block]]
> 
> It loops over a set of blocks and performs the fall message on each of 
> them. If a block is not falling I don't want it to be in aSet after the 
> function returned so I remove it from the set. Squeak doesn't seem to 
> like that since some rather strange error now occures: If a block is 
> removed from the set, the next block in the set will be left out from 
> looping... sometimes. Nearly all the time it works as exspected, only 
> sometimes a block is ignored. I couldn't find any pattern when this happens.
> 
> So my question is: Has anyone an explanation for this? Does the error 
> occure because the way I'm doing it is fundamentally wrong? Or am I just 
> overlooking something and it is possible to remove blocks from a 
> collection while looping over it?
> 

The explanation is a bit technical.
A Set stores its elements in an Array (the instance variable named 'array').
This array has empty slots marked with nil.
If an element of the array is nil, that means the slot is empty.
You can verify that Set>>do: is iterating on all non nil array slots.
And Set>>#remove:ifAbsent: does put a nil in the array slot
(in place of the removed element).
But it then calls #fixCollisionsFrom: which has a license to relocate elements.
If that ever happens, then the #do: loop will proceed at next index...
It will then ignore the elements that were moved back... Is that clear?

How does a Set store and retrieve elements in the array?
You can learn by browsing Set superclass comment
(HashedCollection comment if you prefer).
HashedCollection try to achieve a O(1) cost for storing/retrieving an element.
It does so by computing a hash code of the element,
then taking modulo the number of slots in the array,
This gives a slot index for the element (see Set>>#scanFor:)
But a collision can happen, that is the slot is already occupied.
In such case, the element is stored in the next available slot.

Note that the efficiency of Set decrease with the number of collisions.
Indeed, all successive slots must be scanned
- until the element is found (the element is then present)
- or until an empty slot is found (the element is then absent).
The worst case efficiency would be O(n),
- if all elements would collide to the same slot,
- or if the set was full (a Set must preserve some empty room)

Now, imagine that an element 'a' caused a collision of element 'b' in theSet.
Imagine we remove a but don't fix the collision.
When we will inquire if (theSet includes: b), here is what will happen:
- the algorithm will compute the slot for storing b,
- it will find that it is the same slot as a,
- it will find nil in the array at this index (because we removed a),
- which will be interpreted as theSet does not include b
(because b should occupy the slot it has been given,
or one of the successive slots if this slot is occupied by a collider).

Hope you now understand a bit more the invariants of a Set,
and why elements are moved...
I'm not so sure to be that clear...
But you beginners often have a secret for asking tough questions ;)

So you cannot safely add nor remove on such collection while iterating on it.
It's true of most collections anyway.

A simple solution is also to use a copy, like 
      doFallForAllIn: aSet
          aSet copy do:
              [:block | block fall ifFalse: [aSet remove: block]]

And I can predict that you will get some trouble when
(theSet collect: [:e | e hash \\ theSet array size]) size < theSet size,
unless there is a single collision wrapping over the end of theSet...
I let you find the correct prediction sentence for such case as an exercize ;)

I also recommend using the debugger
- via the menu 'debug it',
- or by inserting a halt somewhere in your code,
this is a good way to understand what's going on in the machine.
Beware, a halt in a kernel class might halt more often than desired.

Nicolas

> Thanks in advance!
> 






More information about the Beginners mailing list