[squeak-dev] The Inbox: Collections-nice.891.mcz

David T. Lewis lewis at mail.msen.com
Sun May 3 14:46:49 UTC 2020


I like the Collections-nice.891 proposal a lot. It is big improvement
for readability and comprehension.

The name 'FixedBitWidthArray' seems good to me. It might sound awkward
at first, but it clearly indicates the nature of this kind of collection,
and helps the reader understand the difference between these collections
compared to collections of object pointers and immediates.

The new class comments are also helpful, because they explain the basic
at:put: protocol and the interpretation of the array elements, which is
quite difference compared to other kinds of collection.

There is a minor typo in the SignedByteArray class comment, which
should say 'SignedByteArrays store...'

Dave

On Sun, May 03, 2020 at 03:52:38PM +0200, Nicolas Cellier wrote:
> Hi Subbu,
> Yes those raw bits are somehow like immediates, but not exactly...
> 
> Immediates are objects having their value encoded into the pointer slot
> (either in 4 or 8 bytes, according to 32bits or 64bits VM word size).
> Currently, this covers only SmallInteger, Character and SmallFloat on
> 64bits.
> 
> Here we have values encoded into slots of 1, 2, 4 or 8 bytes, but not into
> an object oriented pointer slot.
> Technically, #(1 2.0 $3) is an Array of immediates, while ((ColorArray
> with: Color black) first) is not an immediate...
> So even if it is the same notion of encoded value, it's not an exact
> match...
> 
> Concerning the use cases, I effectively want to use such bit arrays for
> fast data transfer.
> For example, it is useful for FFI I use exclusivily this kind of array for
> Smallapack...
> But also when reading big files in Matlab, National Instrument TDMS or HDF5
> format.
> it really helps to have all the possible flavours for common elementary
> types of values.
> Otherwise, I have to use an intermediate ByteArray, or pointers to external
> heap via FFI (like I did in Smallapack).
> 
> More than often, the data transfer can handle offset and stride via a
> BitBlt tricks (unless we have an odd layout).
> This enables extracting a single "column" or bloc of data from a big file
> with a single copy.
> I may need to extend BitBlt to cope with all the available bit-widths, not
> just 8 (byte) or 32 (word) though.
> 
> Also, those formats offer packed and contiguous memory layout which is an
> advantage too when dealing with large chunks of data.
> Especially if we have vectorized primitives operating on the arrays.
> 
> Also, creating non-immediate objects on the fly thru #at: #at:put: is very
> efficient if VM has generation scavenger because those objects are
> generally short-lived.
> While retaining all the pointers to a whole collection of non immediate
> objects is putting a lot of pressure on the garbage collector.
> 
> The advantage somehow diminish with the advent of 64bits VM: most values
> can be immediates, so we have quasi-contiguous data at a few exceptions,
> and not so much GC pressure.
> But still, the primitives can operate on raw bits, without having to handle
> the immediate tag, nor exceptional (non immediate) values.
> 
> For the anecdote, in the 90s, I started to experiment some crashes in
> objectworks/visualworks when handling large Arrays of Float.
> The console would only report: *out of memory*.
> With increasing processor speed, the memory where exhausted before the low
> space monitoring process had a chance to handle the situation.
> I then decided to handle all my Arrays of Float (Double) thru some
> UninterpretedBytes and ad-hoc primitives for at: at:put:
> Since then, I never came back to pointer oriented arrays: if we want
> Smalltalk to scale, we need those basic objects  :)
> 
> 
> Le dim. 3 mai 2020 ?? 06:50, K K Subbu <kksubbu.ml at gmail.com> a ??crit :
> 
> > On 02/05/20 5:41 pm, commits at source.squeak.org wrote:
> > > Nicolas Cellier uploaded a new version of Collections to project The
> > Inbox:
> > > http://source.squeak.org/inbox/Collections-nice.891.mcz
> > >
> > > ==================== Summary ====================
> > >
> > > Name: Collections-nice.891
> > > Author: nice
> > > Time: 2 May 2020, 7:40:45.298967 pm
> > > UUID: 08510be0-8293-6744-959d-c1d41bc13ae1
> > > Ancestors: Collections-nice.890
> > >
> > > Experimental - For discussion
> > >
> > > Group some (most) non-pointers collections under an abstract
> > FixedBitWifthArray.
> > > I know, the name is hard to pronounce and thus ugly: it's opened to
> > discussion.
> > >
> > > This enables factorization of some methods, for example the trick for
> > atAllPut:
> > > Also notice that most methods are shared between FloatArray and
> > Float64Array.
> >
> > How about ImmediateWord/ImmediateObject and an ImmediateArray (an array
> > consisting only of Immediate elements)? It would be consistent with
> > isImmediateClass method.
> >
> > An object chunk could be checked at loading time to see if it needs to
> > be converted from immediate to pointers or vice versa. In the typical
> > case, this will be a nop. But if the image is moved to a different host
> > type (say from 64b to 32b or from x86 to ARM), then some immediate
> > numbers may be converted into pointers or vice versa. If this increases
> > loading time for large images, then the image may be saved locally.
> >
> > This is just a strawman. I haven't really thought through all its
> > implications.
> >
> > Regards .. Subbu
> >
> >

> 



More information about the Squeak-dev mailing list