Hi All, SparseLargeTable appears not to be sparse at all. If you look at SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated instances and then make them sparse by sending zapDefaultOnlyEntries.
It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the existing approach won't work.
I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize.
Best Eliot
At Sat, 1 Nov 2008 12:12:21 -0700, Eliot Miranda wrote:
[1 <multipart/alternative (7bit)>] [1.1 <text/plain; ISO-8859-1 (7bit)>]
[1.2 <text/html; ISO-8859-1 (7bit)>] Hi All,
SparseLargeTable appears not to be sparse at all. If you look at SparseLargeTable>>
initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated instances and then make them sparse by sending zapDefaultOnlyEntries.
Correct. As you saw, it is designed for the usage pattern of creating a read-only table once.
It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the existing approach won't work.
Ah ha.
I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize.
I'm not aware of anybody who fixed. Well, if you need to hold 2^32 entries, you should do whatever it takes and doing the lazy initialization seems to be good...
-- Yoshiki
On Sat, Nov 1, 2008 at 2:49 PM, Yoshiki Ohshima yoshiki@vpri.org wrote:
At Sat, 1 Nov 2008 12:12:21 -0700, Eliot Miranda wrote:
[1 <multipart/alternative (7bit)>] [1.1 <text/plain; ISO-8859-1 (7bit)>]
[1.2 <text/html; ISO-8859-1 (7bit)>] Hi All,
SparseLargeTable appears not to be sparse at all. If you look at
SparseLargeTable>>
initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin
is initialized with an instance of the base
class rather than being filled lazily. Further, instead of
pvtNoCheckAt:put: lazily creating an entry in each bin as
needed it simply throws away the write if the bin is empty. Instead the
usage pattern is to create fully-populated
instances and then make them sparse by sending zapDefaultOnlyEntries.
Correct. As you saw, it is designed for the usage pattern of creating a read-only table once.
It seems to me that
SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty
until pvtNoCheckAt:put: puts other than the default value. Since I want
to create a sparse table with 2^32 entries the
existing approach won't work.
Ah ha.
I wondered whether anyone else had already fixed this or whether there's
a good reason not to lazily initialize.
I'm not aware of anybody who fixed. Well, if you need to hold 2^32 entries, you should do whatever it takes and doing the lazy initialization seems to be good...
Here's what I came up with. Note the use of \ in noCheckAt:[put:] which avoids an extra arithmetic op (a multiplication). You might consider replacing the relevant parts of SparseLargeTable with this as I don't see the utility of filling the bins eagerly.
-- Yoshiki
Yes, thank you, this is a good idea to restore lazy initialization. Current strategy is strange. Reading current code, I wonder if the intention was to make an access (self noCheckAt: size + 1) fail when (size \ chunkSize ~= 0) ? but this should already be handled by pvtCheckIndex: ...
Note that using \ is preferable for code readability, but not for performance if you use LargeInteger:
| int32 q r | int32 := 1 << 31. { [q := int32 - 1 // 1024 + 1. r := int32 - 1 \ 1024 + 1] bench.
[q := int32 - 1 // 1024 + 1. r := int32 - 1 - (q-1*1024) + 1] bench. }.
#('30482.50349930014 per second.' '42663.0673865227 per second.')
Yes because, primitive 31 fails and super revert to the second form which was inlined in SparseLargeTable...
Sure this does not really matter, I doubt this access be the bottleneck of any code, and maybe you already hacked primitive 31...
Nicolas
Eliot Miranda a écrit :
On Sat, Nov 1, 2008 at 2:49 PM, Yoshiki Ohshima <yoshiki@vpri.org mailto:yoshiki@vpri.org> wrote:
At Sat, 1 Nov 2008 12:12:21 -0700, Eliot Miranda wrote: > > [1 <multipart/alternative (7bit)>] > [1.1 <text/plain; ISO-8859-1 (7bit)>] > > [1.2 <text/html; ISO-8859-1 (7bit)>] > Hi All, > > SparseLargeTable appears not to be sparse at all. If you look at SparseLargeTable>> > initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base > class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as > needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated > instances and then make them sparse by sending zapDefaultOnlyEntries. Correct. As you saw, it is designed for the usage pattern of creating a read-only table once. > It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty > until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the > existing approach won't work. Ah ha. > I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize. I'm not aware of anybody who fixed. Well, if you need to hold 2^32 entries, you should do whatever it takes and doing the lazy initialization seems to be good...
Here's what I came up with. Note the use of \ in noCheckAt:[put:] which avoids an extra arithmetic op (a multiplication). You might consider replacing the relevant parts of SparseLargeTable with this as I don't see the utility of filling the bins eagerly.
-- Yoshiki
There was a bug in the code I posted. I forgot to initialize new chunks with the default value. Also, atAllPut: has an obvious and convenient implementation. So the attached...
On Mon, Nov 3, 2008 at 3:12 PM, nicolas cellier ncellier@ifrance.comwrote:
Yes, thank you, this is a good idea to restore lazy initialization. Current strategy is strange. Reading current code, I wonder if the intention was to make an access (self noCheckAt: size + 1) fail when (size \ chunkSize ~= 0) ? but this should already be handled by pvtCheckIndex: ...
Note that using \ is preferable for code readability, but not for performance if you use LargeInteger:
| int32 q r | int32 := 1 << 31. { [q := int32 - 1 // 1024 + 1. r := int32 - 1 \ 1024 + 1] bench.
[q := int32 - 1 // 1024 + 1. r := int32 - 1 - (q-1*1024) + 1] bench.
}.
#('30482.50349930014 per second.' '42663.0673865227 per second.')
Yes because, primitive 31 fails and super revert to the second form which was inlined in SparseLargeTable...
Sure this does not really matter, I doubt this access be the bottleneck of any code, and maybe you already hacked primitive 31...
Nicolas
Eliot Miranda a écrit :
On Sat, Nov 1, 2008 at 2:49 PM, Yoshiki Ohshima <yoshiki@vpri.orgmailto: yoshiki@vpri.org> wrote:
At Sat, 1 Nov 2008 12:12:21 -0700, Eliot Miranda wrote: > > [1 <multipart/alternative (7bit)>] > [1.1 <text/plain; ISO-8859-1 (7bit)>] > > [1.2 <text/html; ISO-8859-1 (7bit)>] > Hi All, > > SparseLargeTable appears not to be sparse at all. If you look at SparseLargeTable>> > initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base > class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as > needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated > instances and then make them sparse by sending zapDefaultOnlyEntries.
Correct. As you saw, it is designed for the usage pattern of
creating a read-only table once.
> It seems to me that
SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty > until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the > existing approach won't work.
Ah ha. > I wondered whether anyone else had already fixed this or whether
there's a good reason not to lazily initialize.
I'm not aware of anybody who fixed. Well, if you need to hold 2^32
entries, you should do whatever it takes and doing the lazy initialization seems to be good...
Here's what I came up with. Note the use of \ in noCheckAt:[put:] which avoids an extra arithmetic op (a multiplication). You might consider replacing the relevant parts of SparseLargeTable with this as I don't see the utility of filling the bins eagerly.
-- Yoshiki
Eliot Miranda a écrit :
Sure this does not really matter, I doubt this access be the bottleneck of any code, and maybe you already hacked primitive 31...
No, but if you have, I'm all ears :)
Hehe, I don't even know the man-hour cost of such a hack... The day when identified as a real bottleneck i will have a look.
Concerning SparseLargeTable, I can understand the utility of base as: any element of index < base is known to be = defaultValue. This should economize some space by reducing basicSize requirement... But it does not ! Browse #new:chunkSize:arrayClass:base:defaultValue:
Also, with base ~= 1, things like #printString become strange:
(SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1) printString. -> 'a SparseLargeTable(-1 -1)'
Though its size is 3, and: | tmp | tmp := SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1. (1 to: 3) collect: [:i | tmp at: i] -> #(-1 -1 -1)
Also, in #analyzeSpaceSaving total should be either total := size - base + 1. or: total := size. but not: total := size - base. as you proposed.
Also, lastChunkSize is not set to a correct size. It should be: (lastChunkSize := size - base + 1 \ chunkSize) = 0 ifTrue: [lastChunkSize := chunkSize]. which can be more efficiently written: lastChunkSize := size - base \ chunkSize + 1.
Well, I guess every one use with base = 1...
Nicolas
Hmm, you must also define
arrayClass ^arrayClass
otherwise, printString might fail.
nicolas cellier a écrit :
Eliot Miranda a écrit :
Sure this does not really matter, I doubt this access be the bottleneck of any code, and maybe you already hacked primitive 31...
No, but if you have, I'm all ears :)
Hehe, I don't even know the man-hour cost of such a hack... The day when identified as a real bottleneck i will have a look.
Concerning SparseLargeTable, I can understand the utility of base as: any element of index < base is known to be = defaultValue. This should economize some space by reducing basicSize requirement... But it does not ! Browse #new:chunkSize:arrayClass:base:defaultValue:
Also, with base ~= 1, things like #printString become strange:
(SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1) printString. -> 'a SparseLargeTable(-1 -1)'
Though its size is 3, and: | tmp | tmp := SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1. (1 to: 3) collect: [:i | tmp at: i] -> #(-1 -1 -1)
Also, in #analyzeSpaceSaving total should be either total := size - base + 1. or: total := size. but not: total := size - base. as you proposed.
Also, lastChunkSize is not set to a correct size. It should be: (lastChunkSize := size - base + 1 \ chunkSize) = 0 ifTrue: [lastChunkSize := chunkSize]. which can be more efficiently written: lastChunkSize := size - base \ chunkSize + 1.
Well, I guess every one use with base = 1...
Nicolas
Nicholas,
thanks for this checking! See below...
On Mon, Nov 3, 2008 at 4:44 PM, nicolas cellier ncellier@ifrance.comwrote:
Eliot Miranda a écrit :
Sure this does not really matter, I doubt this access be the bottleneck of any code, and maybe you already hacked primitive 31...
No, but if you have, I'm all ears :)
Hehe, I don't even know the man-hour cost of such a hack... The day when identified as a real bottleneck i will have a look.
Concerning SparseLargeTable, I can understand the utility of base as: any element of index < base is known to be = defaultValue. This should economize some space by reducing basicSize requirement... But it does not ! Browse #new:chunkSize:arrayClass:base:defaultValue:
Also, with base ~= 1, things like #printString become strange:
(SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1) printString. -> 'a SparseLargeTable(-1 -1)'
The old printOn: was wrong. It should read
printElementsOn: aStream aStream nextPut: $(. self isEmpty ifFalse: [base to: size + base - 2 do: [:index | aStream print: (self at: index); space]. aStream print: (self at: size + base - 1)]. aStream nextPut: $)
Though its size is 3, and: | tmp | tmp := SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1. (1 to: 3) collect: [:i | tmp at: i] -> #(-1 -1 -1)
Also, in #analyzeSpaceSaving total should be either total := size - base + 1. or: total := size. but not: total := size - base. as you proposed.
Yes.
Also, lastChunkSize is not set to a correct size. It should be: (lastChunkSize := size - base + 1 \ chunkSize) = 0 ifTrue: [lastChunkSize := chunkSize]. which can be more efficiently written: lastChunkSize := size - base \ chunkSize + 1.
Well, I guess every one use with base = 1...
No that's not right. base determines the first index, so that e.g. SparseLargeArray new: (2 raisedToInteger: 32) chunkSize: 32 * 1024 arrayClass: ByteArray base: 0 defaultValue: 0
provides a bye-addressible array of 2^32 elements with indices 0 through (2 raisedTo: 32) - 1.
I think my code is correct. e.g.
{ ((SparseLargeTable new: 15 chunkSize: 8 arrayClass: Array base: 2 defaultValue: -1) basicAt: 2) size. ((SparseLargeTable new: 15 chunkSize: 8 arrayClass: Array base: 3 defaultValue: -1) basicAt: 2) size. ((SparseLargeTable new: 16 chunkSize: 8 arrayClass: Array base: 2 defaultValue: -1) basicAt: 2) size. ((SparseLargeTable new: 16 chunkSize: 8 arrayClass: Array base: 3 defaultValue: -1) basicAt: 2) size } #(7 7 8 8)
I've attached the fixed code.
Eliot Miranda a écrit :
Also, lastChunkSize is not set to a correct size. It should be: (lastChunkSize := size - base + 1 \\ chunkSize) = 0 ifTrue: [lastChunkSize := chunkSize]. which can be more efficiently written: lastChunkSize := size - base \\ chunkSize + 1. Well, I guess every one use with base = 1...
No that's not right. base determines the first index, so that e.g. SparseLargeArray new: (2 raisedToInteger: 32) chunkSize: 32 * 1024 arrayClass: ByteArray base: 0 defaultValue: 0
provides a bye-addressible array of 2^32 elements with indices 0 through (2 raisedTo: 32) - 1.
That's interesting because that's exactly how I understood 'base' first. But that's not how it did work before your patches. SparseLargeTable inherits from ArrayedCollection for which the whole protocol require indices between 1 and self size...
And that is effectively how SparseLargeTable at: and at:put: did handle the indices... (mySparseTable at: 0) did not work.
A logical explanation was that base was here for saving bytes, but that was only a deduction, i did not discuss that with Yoshiki.
That's also why the (self basicAt: anIndex) were protected in #noChekAt: the first elements (before base) and also the last ones (see how #zapDefaultOnlyEntries use #privateSize: ) were potentially out of self "basicBounds".
Now, the modifications you made are appealing, but beware not using one of the 1-based inherited message on self. That's rather not bug-proof.
Nicolas
On Wed, Nov 5, 2008 at 11:41 AM, nicolas cellier ncellier@ifrance.comwrote:
Eliot Miranda a écrit :
Also, lastChunkSize is not set to a correct size. It should be: (lastChunkSize := size - base + 1 \ chunkSize) = 0 ifTrue: [lastChunkSize := chunkSize]. which can be more efficiently written: lastChunkSize := size - base \ chunkSize + 1.
Well, I guess every one use with base = 1...
No that's not right. base determines the first index, so that e.g. SparseLargeArray new: (2 raisedToInteger: 32) chunkSize: 32 * 1024 arrayClass: ByteArray base: 0 defaultValue: 0
provides a bye-addressible array of 2^32 elements with indices 0 through (2 raisedTo: 32) - 1.
That's interesting because that's exactly how I understood 'base' first. But that's not how it did work before your patches. SparseLargeTable inherits from ArrayedCollection for which the whole protocol require indices between 1 and self size...
Can you say which methods require this and why?
And that is effectively how SparseLargeTable at: and at:put: did handle the indices... (mySparseTable at: 0) did not work.
A logical explanation was that base was here for saving bytes, but that was only a deduction, i did not discuss that with Yoshiki.
That's also why the (self basicAt: anIndex) were protected in #noChekAt: the first elements (before base) and also the last ones (see how #zapDefaultOnlyEntries use #privateSize: ) were potentially out of self "basicBounds".
Now, the modifications you made are appealing, but beware not using one of the 1-based inherited message on self. That's rather not bug-proof.
I'd like to plug those holes. I absolutely need 0-based sparse arrays :)
Nicolas
Eliot Miranda a écrit :
Can you say which methods require this and why?
Well there are just too many forms of 1 to: self size do: [:i | (self at:i ) doSomething] to fit in this list.
Of course, I doubt you want to iterate over 2**32 elements, so maybe this is a false probem. Though you can legitimately attempt a:
^aSparseTable findFirst: [:element | element > 0].
I'd like to plug those holes. I absolutely need 0-based sparse arrays :)
Maybe you won't be trapped by any hole for your private usage...
That's different if the class is supposed to be generic. But since an anythingBut-1-based sequenceable collection would not fit in existing code base without pain, that's probably a false problem too.
I try to imagine how to:
It would be safer to hook the class in a new hierarchy... Beware, even SequenceableCollection is crippled with 1 to: self size do: Only Collection implements all loops with do:
Personnally, I would like this do: self error: 'you are trying to iterate over a huge collection'.
and implement some nonDefaultDo: aBlock | chunk index | 1 to: self basicSize do: [:i | (chunk := self basicAt: i) isNil ifFalse: [ chunk do: [:element | element = defaultValue ifFalse: [ aBlock value: element]]]]
Then I will trap myself with 0-based indices... For example, i would write inner loop of nonDefaultWithIndexDo: 1 to: chunk size do: [:j | (chunk at: j) = defaultValue... or chunk withIndexDo: [:element :j | element = defaultValue ifFalse: [ aBlock value: i - 1 * chunkSize + j + base - 1 value: element
But wait, who told me chunk was 1-based? I could for example populate a SparseLargeTable with 0-based SparseLargeTable as arrayClass just for making my own 2**64 collection (i can't help challenging your 2**32)... So I would need a (chunk withRankDo: ) or just (chunk atRank: ), and to cover my sparse case, probably generalize a nonDefaultWithRankDo:... ... or #nonDefaultWithOffsetDo: just to remove the - 1.
Yes, 0-based pain startingAt: second message (messageList atOffset: 1).
Nicolas
At Tue, 04 Nov 2008 01:44:39 +0100, nicolas cellier wrote:
Well, I guess every one use with base = 1...
The "base" was of course set differently if you browse the users. And if you browse the users, you can simply tell that it is exclusively used for the multilingualization part of the system, and I didn't really thought about printing a big table...
-- Yoshiki
Yoshiki Ohshima a écrit :
At Tue, 04 Nov 2008 01:44:39 +0100, nicolas cellier wrote:
Well, I guess every one use with base = 1...
The "base" was of course set differently if you browse the users. And if you browse the users, you can simply tell that it is exclusively used for the multilingualization part of the system, and I didn't really thought about printing a big table...
-- Yoshiki
Ah, the two instances in my 3.10.2 occidental image are 1-based... SparseLargeTable allInstances collect: [:e | e base]. -> #(1 1)
The problem is more with accessing than printing. I'm curious, how do you use base ~= 1 ?
Anyway, maybe it won't do what you think... because I have other griefs with creation messages ignoring their arguments :)
new: size chunkSize: chunkSize arrayClass: aClass base: b
^self new: size chunkSize: chunkSize arrayClass: Array base: 1 defaultValue: nil.
Nicolas
At Fri, 07 Nov 2008 21:08:22 +0100, nicolas cellier wrote:
Yoshiki Ohshima a écrit :
At Tue, 04 Nov 2008 01:44:39 +0100, nicolas cellier wrote:
Well, I guess every one use with base = 1...
The "base" was of course set differently if you browse the users. And if you browse the users, you can simply tell that it is exclusively used for the multilingualization part of the system, and I didn't really thought about printing a big table...
-- Yoshiki
Ah, the two instances in my 3.10.2 occidental image are 1-based... SparseLargeTable allInstances collect: [:e | e base]. -> #(1 1)
Your image probably doesn't contain any of the CJK-ish langauges, I would guess.
The problem is more with accessing than printing. I'm curious, how do you use base ~= 1 ?
Take a look for the references in the OLPC Etoys image.
Anyway, maybe it won't do what you think... because I have other griefs with creation messages ignoring their arguments :)
new: size chunkSize: chunkSize arrayClass: aClass base: b
^self new: size chunkSize: chunkSize arrayClass: Array base: 1 defaultValue: nil.
That is funny.
The use I know was always with the "fully qualified" version in the image. (Thank you for spotting!)
-- Yoshiki
Eliot
if /when you have something, I would be more than happy to harvest the fixes.
stef
On Nov 1, 2008, at 8:12 PM, Eliot Miranda wrote:
Hi All,
SparseLargeTable appears not to be sparse at all. If you look
at SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated instances and then make them sparse by sending zapDefaultOnlyEntries.
It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the existing approach won't work.
I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize.
Best Eliot
Hi Stef, I already experimented Eliot's SparseTable changes in Pharo, but I'm waiting for stabilization before publishing.
stephane ducasse a écrit :
Eliot
if /when you have something, I would be more than happy to harvest the fixes.
stef
On Nov 1, 2008, at 8:12 PM, Eliot Miranda wrote:
Hi All,
SparseLargeTable appears not to be sparse at all. If you look at
SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated instances and then make them sparse by sending zapDefaultOnlyEntries.
It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the existing approach won't work.
I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize.
Best Eliot
squeak-dev@lists.squeakfoundation.org