[squeak-dev] The Inbox: Collections-nice.891.mcz

Tobias Pape Das.Linux at gmx.de
Sun May 3 15:13:49 UTC 2020


> On 03.05.2020, at 15:52, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
> 
> Hi Subbu,
> Yes those raw bits are somehow like immediates, but not exactly...

So the name maybe should include "raw"?
:D
-t
> 
> Immediates are objects having their value encoded into the pointer slot (either in 4 or 8 bytes, according to 32bits or 64bits VM word size).
> Currently, this covers only SmallInteger, Character and SmallFloat on 64bits.
> 
> Here we have values encoded into slots of 1, 2, 4 or 8 bytes, but not into an object oriented pointer slot.
> Technically, #(1 2.0 $3) is an Array of immediates, while ((ColorArray with: Color black) first) is not an immediate...
> So even if it is the same notion of encoded value, it's not an exact match...
> 
> Concerning the use cases, I effectively want to use such bit arrays for fast data transfer.
> For example, it is useful for FFI I use exclusivily this kind of array for Smallapack...
> But also when reading big files in Matlab, National Instrument TDMS or HDF5 format.
> it really helps to have all the possible flavours for common elementary types of values.
> Otherwise, I have to use an intermediate ByteArray, or pointers to external heap via FFI (like I did in Smallapack).
> 
> More than often, the data transfer can handle offset and stride via a BitBlt tricks (unless we have an odd layout).
> This enables extracting a single "column" or bloc of data from a big file with a single copy.
> I may need to extend BitBlt to cope with all the available bit-widths, not just 8 (byte) or 32 (word) though.
> 
> Also, those formats offer packed and contiguous memory layout which is an advantage too when dealing with large chunks of data.
> Especially if we have vectorized primitives operating on the arrays.
> 
> Also, creating non-immediate objects on the fly thru #at: #at:put: is very efficient if VM has generation scavenger because those objects are generally short-lived.
> While retaining all the pointers to a whole collection of non immediate objects is putting a lot of pressure on the garbage collector.
> 
> The advantage somehow diminish with the advent of 64bits VM: most values can be immediates, so we have quasi-contiguous data at a few exceptions, and not so much GC pressure.
> But still, the primitives can operate on raw bits, without having to handle the immediate tag, nor exceptional (non immediate) values.
> 
> For the anecdote, in the 90s, I started to experiment some crashes in objectworks/visualworks when handling large Arrays of Float.
> The console would only report: *out of memory*.
> With increasing processor speed, the memory where exhausted before the low space monitoring process had a chance to handle the situation.
> I then decided to handle all my Arrays of Float (Double) thru some UninterpretedBytes and ad-hoc primitives for at: at:put:
> Since then, I never came back to pointer oriented arrays: if we want Smalltalk to scale, we need those basic objects  :)
> 
> 
> Le dim. 3 mai 2020 à 06:50, K K Subbu <kksubbu.ml at gmail.com> a écrit :
> On 02/05/20 5:41 pm, commits at source.squeak.org wrote:
> > Nicolas Cellier uploaded a new version of Collections to project The Inbox:
> > http://source.squeak.org/inbox/Collections-nice.891.mcz
> > 
> > ==================== Summary ====================
> > 
> > Name: Collections-nice.891
> > Author: nice
> > Time: 2 May 2020, 7:40:45.298967 pm
> > UUID: 08510be0-8293-6744-959d-c1d41bc13ae1
> > Ancestors: Collections-nice.890
> > 
> > Experimental - For discussion
> > 
> > Group some (most) non-pointers collections under an abstract FixedBitWifthArray.
> > I know, the name is hard to pronounce and thus ugly: it's opened to discussion.
> > 
> > This enables factorization of some methods, for example the trick for atAllPut:
> > Also notice that most methods are shared between FloatArray and Float64Array.
> 
> How about ImmediateWord/ImmediateObject and an ImmediateArray (an array 
> consisting only of Immediate elements)? It would be consistent with 
> isImmediateClass method.
> 
> An object chunk could be checked at loading time to see if it needs to 
> be converted from immediate to pointers or vice versa. In the typical 
> case, this will be a nop. But if the image is moved to a different host 
> type (say from 64b to 32b or from x86 to ARM), then some immediate 
> numbers may be converted into pointers or vice versa. If this increases 
> loading time for large images, then the image may be saved locally.
> 
> This is just a strawman. I haven't really thought through all its 
> implications.
> 
> Regards .. Subbu
> 
> 




More information about the Squeak-dev mailing list