[Newbies] Re: Another extension proposal -> subsets

nicolas cellier ncellier at ifrance.com
Fri Jul 25 10:32:07 UTC 2008


nicolas cellier <ncellier <at> ifrance.com> writes:
> 
> He, press ALT-v to get versions of #combinations:atATimeDo: and thanks tk!
> 
> I like bit sift solution too for it's simplicity.
> The problem is that it will iterate p times for creating subset of size p.
> #combinations:atATimeDo: does not. It is building subsets in parallel.
> The required copy does an iteration though, but in a primitive!
> Hence the difference...
> 
> Nicolas
> 

What i wrote is true only if you collect: Arrays.
It is not true if you use #asSet instead of #copy. #asSet is not primitive.

The cost of bit sift can be greatly reduced: factor the reverse operations:

Set>>subsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
      subsets add:
        (workArray maskedBy: (sift printStringBase: 2) reverse) asSet].
   ^subsets

ArrayedCollection>>maskedBy: aBitString
   | maxSize result |
   maxSize := self size min: aBitString size.
   result := (self species new: maxSize) writeStream.
   1 to: maxSize do: [:index |
      (aBitString  at: index) = $1 ifTrue: [result nextPut: (self at: index)]].
   ^result contents

While at it, anyway, you do not need to reverse at all...
Ordering does not matter.
This eliminates one loop.
It still costs 3 loops per partition:
 (1 to: maxSize) and asSet plus printStringBase:...
Thus 2P+Q loops for a subset of size P out of N with P<=Q<=N

You can eliminate the printString and send directly #bitAt: to integers.
See http://bugs.squeak.org/view.php?id=6985 and load as pre-requisite:
    Installer mantis ensureFix: 6985.

Set>>subsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
      subsets add:
        (workArray maskedByIntegerBits: sift) asSet].
   ^subsets

ArrayedCollection>>maskedByIntegerBits: anInteger
   | maxSize result |
   maxSize := self size min: anInteger highBit.
   result := (self species new: maxSize) writeStream.
   1 to: maxSize do: [:index |
      (anInteger bitAt: index) = 1 ifTrue: [result nextPut: (self at: index)]].
   ^result contents

You still have P+Q loops, P<= Q<=N
You can now eliminate the asSet loop using collect:into: / select:into:
(I think we can call this Klaus trick).

Set>>subsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
      subsets add:
        (workArray selectIntegerMask: sift into: (Set new: sift highBit)].
   ^subsets

ArrayedCollection>>selectIntegerMask: anInteger into: aCollection
   1 to: (self size min: anInteger highBit) do: [:index |
      (anInteger bitAt: index) = 1 ifTrue: [aCollection add: (self at: index)]].
   ^aCollection

Only costs Q loops per partition...
Maybe you can now approach #combinations:atAtTimeDo:
I let you try.

Nicolas



More information about the Beginners mailing list