[squeak-dev] The Inbox: Collections-nice.891.mcz

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sun May 3 13:52:38 UTC 2020


Hi Subbu,
Yes those raw bits are somehow like immediates, but not exactly...

Immediates are objects having their value encoded into the pointer slot
(either in 4 or 8 bytes, according to 32bits or 64bits VM word size).
Currently, this covers only SmallInteger, Character and SmallFloat on
64bits.

Here we have values encoded into slots of 1, 2, 4 or 8 bytes, but not into
an object oriented pointer slot.
Technically, #(1 2.0 $3) is an Array of immediates, while ((ColorArray
with: Color black) first) is not an immediate...
So even if it is the same notion of encoded value, it's not an exact
match...

Concerning the use cases, I effectively want to use such bit arrays for
fast data transfer.
For example, it is useful for FFI I use exclusivily this kind of array for
Smallapack...
But also when reading big files in Matlab, National Instrument TDMS or HDF5
format.
it really helps to have all the possible flavours for common elementary
types of values.
Otherwise, I have to use an intermediate ByteArray, or pointers to external
heap via FFI (like I did in Smallapack).

More than often, the data transfer can handle offset and stride via a
BitBlt tricks (unless we have an odd layout).
This enables extracting a single "column" or bloc of data from a big file
with a single copy.
I may need to extend BitBlt to cope with all the available bit-widths, not
just 8 (byte) or 32 (word) though.

Also, those formats offer packed and contiguous memory layout which is an
advantage too when dealing with large chunks of data.
Especially if we have vectorized primitives operating on the arrays.

Also, creating non-immediate objects on the fly thru #at: #at:put: is very
efficient if VM has generation scavenger because those objects are
generally short-lived.
While retaining all the pointers to a whole collection of non immediate
objects is putting a lot of pressure on the garbage collector.

The advantage somehow diminish with the advent of 64bits VM: most values
can be immediates, so we have quasi-contiguous data at a few exceptions,
and not so much GC pressure.
But still, the primitives can operate on raw bits, without having to handle
the immediate tag, nor exceptional (non immediate) values.

For the anecdote, in the 90s, I started to experiment some crashes in
objectworks/visualworks when handling large Arrays of Float.
The console would only report: *out of memory*.
With increasing processor speed, the memory where exhausted before the low
space monitoring process had a chance to handle the situation.
I then decided to handle all my Arrays of Float (Double) thru some
UninterpretedBytes and ad-hoc primitives for at: at:put:
Since then, I never came back to pointer oriented arrays: if we want
Smalltalk to scale, we need those basic objects  :)


Le dim. 3 mai 2020 à 06:50, K K Subbu <kksubbu.ml at gmail.com> a écrit :

> On 02/05/20 5:41 pm, commits at source.squeak.org wrote:
> > Nicolas Cellier uploaded a new version of Collections to project The
> Inbox:
> > http://source.squeak.org/inbox/Collections-nice.891.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Collections-nice.891
> > Author: nice
> > Time: 2 May 2020, 7:40:45.298967 pm
> > UUID: 08510be0-8293-6744-959d-c1d41bc13ae1
> > Ancestors: Collections-nice.890
> >
> > Experimental - For discussion
> >
> > Group some (most) non-pointers collections under an abstract
> FixedBitWifthArray.
> > I know, the name is hard to pronounce and thus ugly: it's opened to
> discussion.
> >
> > This enables factorization of some methods, for example the trick for
> atAllPut:
> > Also notice that most methods are shared between FloatArray and
> Float64Array.
>
> How about ImmediateWord/ImmediateObject and an ImmediateArray (an array
> consisting only of Immediate elements)? It would be consistent with
> isImmediateClass method.
>
> An object chunk could be checked at loading time to see if it needs to
> be converted from immediate to pointers or vice versa. In the typical
> case, this will be a nop. But if the image is moved to a different host
> type (say from 64b to 32b or from x86 to ARM), then some immediate
> numbers may be converted into pointers or vice versa. If this increases
> loading time for large images, then the image may be saved locally.
>
> This is just a strawman. I haven't really thought through all its
> implications.
>
> Regards .. Subbu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200503/26629203/attachment.html>


More information about the Squeak-dev mailing list