[squeak-dev] Re: Ideas about sets and dictionaries

Thu Nov 12 08:13:56 UTC 2009

Igor Stasenko wrote:
> 2009/11/12 Andreas Raab <andreas.raab at gmx.de>:
>> The goal of "be able to hold _any_ potentially existing objects in an image"
>> is neither achievable nor worthwhile. Try this for starters:
>>
>>  (s := Set new) add: s; add: s.
> 
> Why its not worthwhile? 

Because it's an intractable problem. If I understand you correctly then 
you are saying that one should be able to add arbitrary objects to sets 
without any constraints whatsoever. That's not worthwhile because at the 
  very least you will require that objects you're adding to Sets must 
contain finite implementations of #hash as well as implementations of #= 
  that allow comparing instances of arbitrary classes (and probably 
more). That is already an intractable constraint (most images I am aware 
about contain objects that cannot be compared using #=).

At this point you've given up on your initial position of no constraints 
whatsoever and now we're just haggling about which constraints are 
better or worse. Which is of course fair but it's no longer "any 
constraints vs. no constraints" but rather "these constraints vs. those 
constraints".

To repeat, my point here is: All sets have constraints. We're just 
discussing whether the constraint of not allowing nil as an element is 
useful (I happen to think it's not) and whether not allowing an internal 
marker of a set being a member of that set is useful (I happen to think 
that's okay). You may disagree with this but please don't claim that the 
goal is to add objects free of any constraints to sets because none of 
the proposals address this.

> I think instead it worthwhile to have a collections safe from
> such recursive problems, so developers won't need to look for
> workarounds in order to prevent recursion.

It most certainly is. But it's a different problem from allowing nil in 
sets and neither one removes all constraints from objects being added to 
sets. Yes we *should* address this but let's leave this for a different 
discussion on another day. Today we're trying to deal with nil in sets.

> but why sit and wait for these problems lurking in the dark to strike
> us suddenly?
> Its more philosophical problem i think, an approaches we not sharing.
> I could declare any piece of code 'its good enough for use by me'. But
> does that means that its good enough for use by others?
> Does that means that we should stop thinking how to remove the
> obstacles and limitation and make better, future-proof
> implementation, once its 'good enough' especially when we are talking
> about collections - one of the most widely used objects in system?

Please remember that I started this part of the discussion by saying 
"let's lean back and think about how we would address the problem if we 
weren't constrained". I want the discussion about what the best 
long-term solution is, in fact I started it.

However, instead of making any constructive comments towards that end 
you have been viciously attacking the only proposal that was made in 
this context with (from my perspective) completely bogus arguments. That 
doesn't strike me as trying to find the best solution - it strikes me as 
the behavior of someone who is deeply in love with his (admittedly very 
clever) hack ;-)

To be explicit: Do you think that your proposal of using a negative 
tally is truly the *best*, the long-term, the future-proof solution to 
the problem? That it is the solution that you would choose if you would 
implement Set from first principles? Then please say so. But if you can 
think of a better solution, please propose that instead. That's what 
this phase is all about - not about trying to prove someone else wrong 
but to propose -free of constraints- what you think is the best way 
forward. Comparisons are fair in this context, shooting down other 
proposals is not.

>> It would be rather more interesting to me to discuss the practical tradeoffs
>> of the choosing one way (marker instead of nil) over the other (flag
>> indicating nil containment) in terms of implementation and performance
>> costs.
> 
> Sure thing. When tires meet the road.
> Lets run benchmarks (know any good bench suite for sets?) and see the
> difference.

I think what we should do is:

a) Run benchmarks (tiny and macroBenchmarks). I expect no difference but 
I only believe that after having run the benchmarks.

b) Count the number of Set subinstances in an image and compute the 
additional space overhead that adding a slot would bring. In my current 
trunk image that's

	Set allSubInstances size

12906 * 4 => 51624 bytes, i.e., neglectable given that

	Set allSubInstances detectSum:[:s| s capacity].

541602 * 4 => 2166408, so the increase in space is roughly 2.4% per set.

(Interesting. That's less than I expected. If these numbers hold up we 
should definitely consider adding an ivar to Set regardless of what we 
end up using it for)

c) Consider how each proposal affects implementations. Is the 
modification clear and understandable? How many methods did you have to 
change? Are further changes required? What about subclasses (i.e., do 
they need to be aware that they might now contain nil)? Does the change 
affect extension methods loaded from Monticello?

BTW, someone with interest in Randal's proposal will have to step 
forward to provide an implementation for it or else Igor's proposal will 
likely win by default. I'm not in favor of delaying the discussion by 
having proposals that aren't being implemented since we need the 
comparison and I'm not going to do that myself since I'm trying to 
preserve a bit of neutrality here and I tend to be less objective when 
comparing my own code to that of others ;-)

Cheers,
   - Andreas