What is a "word"? (was: [squeak-dev] The Trunk: Kernel-nice.508.mcz)

Thu Oct 28 02:32:37 UTC 2010

I should apologize for driving this thread off topic. The original topic
pertains to #isWords, which as Bert and Eliot both point out is related
to "words" in the sense of 32-bit words in a variableWordSubclass. Whatever
one chooses to call them, they are 32 bits regardless of the image format
and the Smalltalk wordSize. Perhaps some different terminology would help
to distinguish a "32 bit word in a variableWordSubclass" from a "32 or
64 bit object memory word'. Suggestions welcome.

Dave

On Wed, Oct 27, 2010 at 07:34:32PM -0400, David T. Lewis wrote:
> On Wed, Oct 27, 2010 at 10:13:32PM +0200, Nicolas Cellier wrote:
> > 2010/10/27 Eliot Miranda <eliot.miranda at gmail.com>:
> > >
> > >
> > > On Wed, Oct 27, 2010 at 7:43 AM, Bert Freudenberg <bert at freudenbergs.de>
> > > wrote:
> > >>
> > >> On 27.10.2010, at 16:09, David T. Lewis wrote:
> > >>
> > >> > On Wed, Oct 27, 2010 at 03:15:05PM +0200, Bert Freudenberg wrote:
> > >> >>
> > >> >> On 27.10.2010, at 13:30, David T. Lewis wrote:
> > >> >>
> > >> >>> On Tue, Oct 26, 2010 at 07:17:16PM +0000, commits at source.squeak.org
> > >> >>> wrote:
> > >> >>>> Nicolas Cellier uploaded a new version of Kernel to project The
> > >> >>>> Trunk:
> > >> >>>> http://source.squeak.org/trunk/Kernel-nice.508.mcz
> > >> >>>>
> > >> >>>> ==================== Summary ====================
> > >> >>>>
> > >> >>>> Name: Kernel-nice.508
> > >> >>>> Author: nice
> > >> >>>> Time: 26 October 2010, 9:17:01.308 pm
> > >> >>>> UUID: f75f55d1-4fc4-4ba7-8790-248dea6d3136
> > >> >>>> Ancestors: Kernel-ul.507
> > >> >>>>
> > >> >>>> Correct #isWords comment. There is no such thing as 16-bit variables.
> > >> >>>> Don't know what would be the answer in a 64 bit image...
> > >> >>>
> > >> >>> On a 64-bit image, we have this:
> > >> >>>
> > >> >>> Smalltalk wordSize ==> 8
> > >> >>> Bitmap isWords ==> true
> > >> >>> String isWords ==> false
> > >> >>> CompiledMethod isWords ==> false
> > >> >>> Array isWords ==> true
> > >> >>> FloatArray isWords ==> true
> > >> >>> IntegerArray isWords ==> true
> > >> >>>
> > >> >>> The result for String looks like a bug in the 64-bit image or VM (a
> > >> >>> 32-bit
> > >> >>> version of the same image answers true). The other results are the
> > >> >>> same
> > >> >>> as for a 32 bit image.
> > >> >>>
> > >> >>> For a 64-bit image, the word size is 8 and the things in the slots
> > >> >>> after
> > >> >>> the object header may be either 64 bits (object points, float values)
> > >> >>> or 32 bits (e.g. the elements in a FloatArray).
> > >> >>>
> > >> >>> For example, if you have (FloatArray with: Float pi with: Float pi),
> > >> >>> the object has a single 64-bit word containing two 32 bit float
> > >> >>> values.
> > >> >>> However, if you have (Array with: Float pi with: Float pi), the object
> > >> >>> will have two 64 bit words containing the object pointers for two
> > >> >>> Float
> > >> >>> objects.
> > >> >>>
> > >> >>> Dave
> > >> >>
> > >> >> Since in the 64-bit image nothing really has changed except oops being
> > >> >> 64 bits instead of 32 bits wide, and oops not being directly
> > >> >> accessible,
> > >> >> I wonder if #wordSize shouldn't still answer 4 (to be in sync with
> > >> >> #variableWordSubclass etc). Maybe we need a new method #oopSize which
> > >> >> would answer 8.
> > >> >
> > >> > Actually I like the current use of the term "word" because it conveys
> > >> > the idea of the object memory being made up of words of uniform size.
> > >> > The contents of a word might be an oop, or a portion of an object
> > >> > header,
> > >> > or some data component of the object. But in all cases (in the current
> > >> > Squeak designs) the words are the same size, and the location of the
> > >> > word
> > >> > in object memory is an object pointer. If the object pointer happens to
> > >> > point to an object header, then it is an oop. Simple, although not
> > >> > terribly obvious at first glance.
> > >> >
> > >> > So in a 32 bit or 64 bit (or 16 bit, etc) object memory the #wordSize
> > >> > is important for various reasons, regardless of whether the word
> > >> > contains an oop or something else. I think that this also helps
> > >> > convey the important idea that the wordSize in object memory has
> > >> > no direct relationship to sizeof(int) or sizeof(void *) or any of
> > >> > that other platform specific stuff. After all, this is the reason
> > >> > that we can run our 64-bit images on 32-bit platforms.
> > >> >
> > >> > Dave
> > >>
> > >> But what do you call the elements in a variableWordSubclass in a 64 bit
> > >> image then? Half-words? Because 64 bit words are not exposed at all in the
> > >> image. Nowhere, AFAIK.
> 
> That's a good question, and I honestly do not have an answer. But what
> I can say for certain is that #wordSize refers very specifically to the
> size of a (4 or 8 byte) slot in the object memory. In the current interpreter,
> it is controlled by the single compile-time SQ_VI_BYTES_PER_WORD macro,
> which propagates throughout all sorts of stuff in interp.c. Thus the
> meaning of the word size is significant, and refers to the word size
> for the object memory itself as opposed to the storage characteristics
> of various types of objects within the object memory.
> 
> FWIW, the notion of an object memory organized as a uniform array of
> "words" of some agreed size independent of the runtime machine platform,
> and independent of the stuff that gets stored in the object memory words,
> makes perfect sense to me. Of course the fact that it makes sense to
> me probably does not stand as much of a recommendation, but there you
> have it ;-)
> 
> > >
> > > I think the only sensible solution is to call them words and have them be
> > > 32-bits in both 32-bit and 64-bit images. ?There's no necessary
> > > correspondence between the size of an oop and the field-width of a
> > > variableWordSubclass:. ?So we have variableByteSubclass: and
> > > variableWordSubclass: with widths 1 byte and 4 bytes respectively. ?We could
> > > conceivably add variableHalfWordSubclass: and variableDoubleWordSubclass:
> > > for 2 bytes and 8 bytes field widths respectively.
> > > Varying the width of any of the non-oop subclasses between the 32-bit and
> > > 64-bit images will simply break applications because when porting code from
> > > 64-bits to 32-bits values won't fit in the smaller 32-bit interpretation.
> > > ?This is analogous to C's float and double datatypes on 32- and 64-bit
> > > architectures. ?Irrespective of the system being 32- or 64-bit float is
> > > 32-bit and double is 64-bit. ?So byte == 8 bits, halfWord == 16 bits, word
> > > == 32 bits and doubleWord == 64 bits irrespective of the underlying width of
> > > the machine.
> > > Hence better names might be
> > > variable8BitSubclass:
> > > variable16BitSubclass:
> > > variable32BitSubclass:
> > > variable64BitSubclass:
> > > because then one doesn't have to think.
> > > 2?
> > > Eliot
> > >
> > 
> > That's more inline with what i thought it could be...
> > It would then be easier to have st-source compatibility.
> > As I understand this is currently not the case of FloatArray
> > ShortIntegerArray etc...
> > 
> > Nicolas
> 
> Dan and Ian did a pretty good job on the original 64 bit port. It may be
> rough around the edges, but it already works exactly as you suggest that
> it should. A FloatArray behaves identically in the 32 and 64 bit image
> formats, and there is absolutely no incompatibility at the image or
> st-source level. So now we have (since how many years now?) a fully
> working, compatible 64 bit image format just waiting to be turned into
> something more interesting and incompatible. My guess would be that Eliot
> is going to be the first to make some real new advances on this front :)
> 
> Dave
>