[squeak-dev] Re: Ideas about sets and dictionaries

Igor Stasenko siguctua at gmail.com
Thu Nov 12 09:26:33 UTC 2009


2009/11/12 Andreas Raab <andreas.raab at gmx.de>:
> Igor Stasenko wrote:
>>
>> 2009/11/12 Andreas Raab <andreas.raab at gmx.de>:
>>>
>>> The goal of "be able to hold _any_ potentially existing objects in an
>>> image"
>>> is neither achievable nor worthwhile. Try this for starters:
>>>
>>>  (s := Set new) add: s; add: s.
>>
>> Why its not worthwhile?
>
> Because it's an intractable problem. If I understand you correctly then you
> are saying that one should be able to add arbitrary objects to sets without
> any constraints whatsoever. That's not worthwhile because at the  very least
> you will require that objects you're adding to Sets must contain finite
> implementations of #hash as well as implementations of #=  that allow
> comparing instances of arbitrary classes (and probably more). That is
> already an intractable constraint (most images I am aware about contain
> objects that cannot be compared using #=).
>
> At this point you've given up on your initial position of no constraints
> whatsoever and now we're just haggling about which constraints are better or
> worse. Which is of course fair but it's no longer "any constraints vs. no
> constraints" but rather "these constraints vs. those constraints".
>
> To repeat, my point here is: All sets have constraints. We're just
> discussing whether the constraint of not allowing nil as an element is
> useful (I happen to think it's not) and whether not allowing an internal
> marker of a set being a member of that set is useful (I happen to think
> that's okay). You may disagree with this but please don't claim that the
> goal is to add objects free of any constraints to sets because none of the
> proposals address this.
>
>> I think instead it worthwhile to have a collections safe from
>> such recursive problems, so developers won't need to look for
>> workarounds in order to prevent recursion.
>
> It most certainly is. But it's a different problem from allowing nil in sets
> and neither one removes all constraints from objects being added to sets.
> Yes we *should* address this but let's leave this for a different discussion
> on another day. Today we're trying to deal with nil in sets.
>

I am agree with all of the above. Except that i didn't claimed that my
proposal having no constraints, but it leaving less constraints
comparing to existing one and comparing Randal's, IMO.

>> but why sit and wait for these problems lurking in the dark to strike
>> us suddenly?
>> Its more philosophical problem i think, an approaches we not sharing.
>> I could declare any piece of code 'its good enough for use by me'. But
>> does that means that its good enough for use by others?
>> Does that means that we should stop thinking how to remove the
>> obstacles and limitation and make better, future-proof
>> implementation, once its 'good enough' especially when we are talking
>> about collections - one of the most widely used objects in system?
>
> Please remember that I started this part of the discussion by saying "let's
> lean back and think about how we would address the problem if we weren't
> constrained". I want the discussion about what the best long-term solution
> is, in fact I started it.
>
> However, instead of making any constructive comments towards that end you
> have been viciously attacking the only proposal that was made in this
> context with (from my perspective) completely bogus arguments. That doesn't
> strike me as trying to find the best solution - it strikes me as the
> behavior of someone who is deeply in love with his (admittedly very clever)
> hack ;-)
>
> To be explicit: Do you think that your proposal of using a negative tally is
> truly the *best*, the long-term, the future-proof solution to the problem?
> That it is the solution that you would choose if you would implement Set
> from first principles? Then please say so. But if you can think of a better
> solution, please propose that instead. That's what this phase is all about -
> not about trying to prove someone else wrong but to propose -free of
> constraints- what you think is the best way forward. Comparisons are fair in
> this context, shooting down other proposals is not.
>

Now your point is clear to me. I didn't understood initially, what you
meant by 'first principles'.
And of course, i am for viewing at problem from different angles and
choose best one.
Concerning why i am heating up the discussion, because i want these
changes to be adopted
and it is more matter to me to see any move (even not one, which i
proposed), instead of none.

>>> It would be rather more interesting to me to discuss the practical
>>> tradeoffs
>>> of the choosing one way (marker instead of nil) over the other (flag
>>> indicating nil containment) in terms of implementation and performance
>>> costs.
>>
>> Sure thing. When tires meet the road.
>> Lets run benchmarks (know any good bench suite for sets?) and see the
>> difference.
>
> I think what we should do is:
>
> a) Run benchmarks (tiny and macroBenchmarks). I expect no difference but I
> only believe that after having run the benchmarks.
>
before:
1 tinyBenchmarks
 '436115843 bytecodes/sec; 12718765 sends/sec'
 '473635522 bytecodes/sec; 12718765 sends/sec'
 '473635522 bytecodes/sec; 12727537 sends/sec'

after:
1 tinyBenchmarks
 '472324723 bytecodes/sec; 12657701 sends/sec'
 '472324723 bytecodes/sec; 12683800 sends/sec'
 '472760849 bytecodes/sec; 12969029 sends/sec'

seems like, even if there is an overhead it is within the bounds of
noise level/timer accuracy.

Concerning macro benchmarks - where i could download them?


> b) Count the number of Set subinstances in an image and compute the
> additional space overhead that adding a slot would bring. In my current
> trunk image that's
>
>        Set allSubInstances size
>
> 12906 * 4 => 51624 bytes, i.e., neglectable given that
>
>        Set allSubInstances detectSum:[:s| s capacity].
>
> 541602 * 4 => 2166408, so the increase in space is roughly 2.4% per set.
>
> (Interesting. That's less than I expected. If these numbers hold up we
> should definitely consider adding an ivar to Set regardless of what we end
> up using it for)
>
but what i have proposed doesn't requires adding new slot to sets.
And about the cost of having non-nil filler object -> #rehash and #new
will be slower,
unless VM will support new primitive which would allow to pre-fill the
newly created object(s)
slots with other than nils.

> c) Consider how each proposal affects implementations. Is the modification
> clear and understandable? How many methods did you have to change? Are
> further changes required? What about subclasses (i.e., do they need to be
> aware that they might now contain nil)? Does the change affect extension
> methods loaded from Monticello?

that's could take quite a bit of time to document & describe all of the above.
I think that making sure that all tests green is 'good enough' (c),
and if there are breakage,
discovered later, it will show us that our tests need more attention :)

There are quite few methods which suspectible to new functionality
(sets with nils) - actually i think just one  - #like:
which works ok for all non-nil arguments, but may lie to you if you
pass nil as argument,
since it using nil as return value to indicate that nothing similar found.
It should be replaced by #like:ifNone: and all users of #like shoud
use it instead.
Given that there are only 3 senders of #like: in my image there is not
much work to do.

>
> BTW, someone with interest in Randal's proposal will have to step forward to
> provide an implementation for it or else Igor's proposal will likely win by
> default. I'm not in favor of delaying the discussion by having proposals
> that aren't being implemented since we need the comparison and I'm not going
> to do that myself since I'm trying to preserve a bit of neutrality here and
> I tend to be less objective when comparing my own code to that of others ;-)
>
No, i think winner will be approach described by Tobias :)

> Cheers,
>  - Andreas
>
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list