Sets with distinct entries (Was: Re: [squeak-dev] Re: Ideas about sets and dictionaries)

Igor Stasenko siguctua at gmail.com
Thu Nov 12 11:11:38 UTC 2009


2009/11/12 Tobias Pape <Das.Linux at gmx.de>:
> Hello —,
> Am 2009-11-12 um 06:35 schrieb Andreas Raab:
>
>> Igor Stasenko wrote:
>>> The main purpose to be able to handle nils in sets as an elements is to make Set
>>> be able to hold _any_ potentially existing objects in image.
>>
>> Absolutely not. The main purpose of having nil be included in sets is to remove a particular restriction (namely that Sets cannot contain nil), not to "be able to hold _any_ potentially existing objects in an image". What is being proposed is trading one restriction (Sets cannot contain nil) against another one (Sets cannot contain their internal private marker) which is deemed advantageous as it allows us to convert many collections that contain nil into sets without blowing up. *That* is the main purpose.
>>
>> The goal of "be able to hold _any_ potentially existing objects in an image" is neither achievable nor worthwhile. Try this for starters:
>>
>>  (s := Set new) add: s; add: s.
>
> As far as I know, another approach is to have
> Distinct Entry Object inside the set. cf. the GemStone
> approach.  A set, there, contains an ordered collection of
> nils an Entry elements. Thus, the actual element is stored _inside_
> the entry element, not in the underlying (storage) collection
> of the array itself.  As far as I know, this is similar to its Dictionary
> approach.
>   Probably, this will introduce a slightly broader spectrum of
> optimization possibilities.
>
> So long,
>        -Tobias
>

I haven't seen how it implemented in GemStone (so i think that
implementing it will be license clean (tm) ;)

As i see it, the trick is to use a message dispatch to object which
going to be an element, to tell it how it would like
to represent itself in Set. And this is where story begins.

add: newObject
	"Include newObject as one of the receiver's elements, but only if
	not already present. Answer newObject."

	| entry index |
        entry := newObject asSetElement.

	index := self scanFor: entry.
	(array at: index) ifNil: [self atNewIndex: index put: entry].
	^ newObject

a SetElement class then is a one-inst-var object, holding a reference
to proxied object, and
needs to properly implement #hash #= , required by Set.
Also, at each place where we need to feed a real object , a Set should
send an #unwrap (or give more distinct selector name)
to return enclosed object.

a subclasses, CollectionSetElement and NilSetElement implemented
slightly differently to take in account that we are still using nil as
filler object, and collection included in set should prevent going too
deep in #hash and guard against potential recursion.
As an alternative , we migth use already existing LookupKey class, and
subclass from it, instead of having separate SetElement.

All objects then obliged to implement #asSetElement and #unwrap - by
providing the default methods in Object class, which by default doing
nothing, return self.
While Collection and UndefinedObject classes overriding asSetElement
in order to return wrappers instead of real objects.

The main overhead is cost of extra messages sent at each #add: and at
each point where we need to deliver a real set element (#unwrap).

The space overhead can be measured roughly as:

count := 0.
Set allSubInstancesDo: [:each |
	each do: [:value |  value isCollection ifTrue: [ count := count +1 ]]].
count

in my image its = 97197

but another test reveals that this number is much smaller, because we
don't have to be worry about ByteArrays, Strings and Symbols (we could
override #asSetElement back to return self, instead of constructing
wrapper):

count := 0.
Set allSubInstancesDo: [:each |
	each do: [:value |
		(value isCollection and: [value class isBytes not and: [value
isString not] ]) ifTrue: [ count := count +1 ]]].
count
- 5021

So, what you think? Is it worth with trying to implemet it, and see
the real numbers, or its already clear that its wrong way to go?



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list