[squeak-dev] Float hierarchy for 64-bit Spur

Tobias Pape Das.Linux at gmx.de
Fri Nov 21 19:12:01 UTC 2014


Hi Eliot
On 21.11.2014, at 19:25, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> 
> On Fri, Nov 21, 2014 at 8:01 AM, Tobias Pape <Das.Linux at gmx.de> wrote:
>> 
>> On 21.11.2014, at 15:30, Bert Freudenberg <bert at freudenbergs.de> wrote:
>> 
>> >
>> > On 21.11.2014, at 13:53, Tobias Pape <Das.Linux at gmx.de> wrote:
>> >
>> >> On 21.11.2014, at 13:44, Bert Freudenberg <bert at freudenbergs.de> wrote:
>> >>> Also, with the 64 bit format we get many more immediate objects. There already are immediate integers and characters, floats will be the third, there could be more, like immediate points. For those, the small/large distinction does not make sense.
>> >>>
>> >>> Maybe Eliot's idea of keeping "Float" in the name was best, but instead of "small" use "immediate":
>> >>>
>> >>>     Float - BoxedFloat - ImmediateFloat
>> >>>
>> >>>     A Float is either a BoxedFloat or an ImmediateFloat, depending on the magnitude of its exponent.
>> >>
>> >> I don't like the idea of putting a VM/Storage detail into the Class name.
>> >> The running system itself does not care about whether Floats or Integers are
>> >> boxed or immediate.
>> >
>> > Good point. Do you have a suggestion for names reflecting that?
>> 
>> 
>> First: I think it is possible to have both SmallInteger/Large*Integer as well
>> as all Float stuff combined such that we only have
>>         - Integer
>>         - Float
>> and the VM has to deal with internal stuff, ie representing small enough numbers
>> tagged and larger ones as boxed (which could, for example, mean to not be able
>> to access the boxed values from the image side…).
>>   However, this is “Zukunftsmusik” or “ungelegte Eier” (Things to come or not even
>> considered).
>> 
> I don't find this compelling for reasons I've expressed earlier in the thread.  Personally I think the VM shouldn't be in the business of hiding much.  There are advantages to it hiding the machinery that connects contexts to stack frames and methods to machine code because that allows us to use the same system with very different VMs and that's hugely advantageous (see the Stack VM and SqueakJS for examples).  But that doesn't for example hide contexts, it just optimizes teir use.

Probably it is just a matter of viewpoint whether this would be (not) hiding information
or (not) leaking information. At this point in time, I start to question both my proposal
and the current state…


>> 
>> Second: I think the small/large stuff is semantically correct, because that is what
>> it is, whether immediate or not:
>>         - Integer: SmallInteger, LargeInteger
>>         - Float: SmallFloat, LargeFloat
>> I don't think there's confusion about the single=float thing when you don't have
>> the name double somewhere.
> 
> Agreed.
> 
>>  
>>   Rationale against immediate in the name: Immediate/Non-Immediate is a means to
>> an end, which is, speed for small or “few” things: ints, floats, chars. When you
>> make something different immediate — just for fun: very short ascii strings like
>> "hello" stored as 0x000068656C6C6F04 and 04 being the tag — you shouldn't name it
>> ImmediateString but TinyString, because thats why it is there, an optimization
>> for very tiny things.
>> 
> Agreed.  But note that I will /not/ be pursuing things like immediate strings.  IMO this is a bad idea.  Whereas there are really compelling arguments for immediate integers, characters and floats, there aren't for strings or symbols.  

I did not intend to propose immediate string but merely used them as an example, nothing more.

> Most strings and most symbols are longer than 7 bytes
> 
> (ByteSymbol allInstances collect: [:ea| ea size]) sum asFloat / ByteSymbol allInstances size 17.905990063082676
> (ByteString allInstances collect: [:ea| ea size]) sum asFloat / ByteString allInstances size 192.12565808504485
> 
> So choosing this representation doesn't save much space and loses time because the more complex mixed representation is involved in many operations (e.g. replaceFrom:to:with:startingAt: is now way more complex).
> 
> In fact, I'm thinking that a 2 bit tag is probably better.  AFAIA, since I implemented 64-bit VisualWorks with a 3 bit tag no one has added any new immediate types.  Points don't have the necessary dynamic frequency and indeed points with floats may be very common in newer UI architectures.  Making nil, true and false immediates doesn't have much benefit either; they're unique values, and unique addresses work just as well as immediates.  Essentially expanding the number of tagged types, and especially making the tagged type organization non-uniform (see e.g. Eliot Moss's VMs where nil, true, false have one organization, character has a another and SmallInteger another one still) makes the decode bloat, which slows down message send.  So I think for the moment I'll go with a 2 bit tag, giving us an even larger range for SmallDouble and SmallInteger, and keep the simple representation:
> 
> immediates
> [62 bit value][2 bit tag]
> non-immediates
> [64 bit pointer (least 3 bits 0)] -> [8 bit slot count][2 gc bits][22 bit hash][3 gc bits][5 bit format][2 flag bits][22 bit class index]
> -- 
> best,
> Eliot

Best
	-Tobias


More information about the Squeak-dev mailing list