[Vm-dev] Spur Object 64 bit format

Eliot Miranda eliot.miranda at gmail.com
Fri Nov 15 17:26:53 UTC 2013


Hi Igor,

On Fri, Nov 15, 2013 at 5:04 AM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> On 14 November 2013 17:44, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
>>
>> On Thu, Nov 14, 2013 at 3:43 AM, Igor Stasenko <siguctua at gmail.com>wrote:
>>
>>>
>>> On 12 November 2013 19:23, Eliot Miranda <eliot.miranda at gmail.com>wrote:
>>>
>>>>
>>>> Hi Igor,
>>>>
>>>> On Tue, Nov 12, 2013 at 3:37 AM, Igor Stasenko <siguctua at gmail.com>wrote:
>>>>
>>>>>
>>>>> And let me remind you what i proposed back then:
>>>>>
>>>>> :)
>>>>> ===========
>>>>> About immediates zoo.
>>>>>
>>>>> Keep in mind, that the more immediates we have, the more complex
>>>>> implementation
>>>>> tends to be.
>>>>>
>>>>> I would just keep 2 data types:
>>>>>  - integers
>>>>>  - floats
>>>>>
>>>>> and third, special 'arbitrary' immediate , which seen by VM as a
>>>>> 60-bit value.
>>>>> The interpretation of this value depends on lookup in range-table,
>>>>> where developer specifying the correspondence between the value
>>>>> interval and class:
>>>>> [min .. max] -> class
>>>>>
>>>>
>>>> for this idea to go anywhere you'd have to show at least the
>>>> pseudo-code for the inline cache test in machine code methods.  These
>>>> schemes seem great in theory but in practice end up with a long, complex
>>>> and slow fetchClassOf: and/or inline cache test.  To remind you, you have
>>>> to compete with the following in Spur Cog:
>>>>
>>>> Limm:
>>>> andl $0x1, %eax
>>>> j Lcmp
>>>>  Lentry:
>>>> movl %edx, %eax
>>>> andl $0x3, %eax
>>>>  jnz Limm
>>>> movl 0(%edx), %eax
>>>> andl $0x3fffff, %eax
>>>>  Lcmp:
>>>>
>>>
>>> It is extra comparison for immediate case:
>>>
>>> cmp eax, LowRange
>>> jl miss
>>> cmp eax, HighRange
>>> jg miss
>>>
>>
>> I don't understand.  Does that mean there are two cache entries, one for
>> HighRange one for LowRange?  Spell it out please.  Give the pseudo-code.
>>  Right now it is too vague for me to see how it is supposed to work.
>>
>>
> Two entries where? If i understand, in COG you using generated code for
> inline cache, by jumping onto PIC entry for checking if it hits or misses.
> So, you just generate different code for immediate case(s), to check that
> receiver oop value is tagged and within certain range (cache hit) , or not
> (cache miss).
>

That's not right.  PICs are only used at send sites that prove to be
polymorphic, i.e. become PICs after a send failure.  At first they are
simply monomorphic inline caches with a single entry.  Only 9% of active
send sites fail to be rebound to PICs.


>
> A pseudo-code for bit-pattern matching is this:
>
> "presuming least significant bit is tag"
> (oop bitAnd: 2r101) == 2r101 ifTrue: [ hit ] ifFalse: [ miss ]
>
> And code for range matching is this:
>
> "presuming least significant bit is tag"
> (oop odd and: [ oop between: lowValue and: highValue]) ifTrue: [ hit ]
> ifFalse: [ miss ]
>
> "or presuming most significant bit is tag"
> (oop between: lowValue and: highValue) ifTrue: [ hit ] ifFalse: [ miss ]
>

and how are these two pieces of code related?


 so, if value between low & high range, it is already known what class it
>>> is otherwise
>>> you got cache miss.
>>>
>>> And this scheme works even better if you pick highest bit for tagging,
>>> like that you can get rid of testing tag bit completely:
>>>  andl $0x1, %eax
>>> j Lcmp
>>> and start straight from comparing with high and low values.
>>> That of course, if we can use highest bit for tagging.
>>>
>>> The class lookup then will be binary search in table of ranges, which is
>>> O(log n).
>>>
>>
>> So the inline cache becomes a range of entries?  That will cause a
>> significant code bloat.
>>
>
> i'm not sure i understood, what that extra inline cache you talking about?
> presuming that each PIC is generated code, you just generate different
> code for entry which specialized to detect
> that given oop is of certain immediate class.
>
> But if its not, then yes, you need to hold these two values somewhere to
> be able to compare them with input oop.
> But even in that case, i fail to see how single extra comparison could
> cause so much bloat.
>
> And it always good to compare, how much bloat will cause bit pattern
> matching code.
>
> What is clearly beneficial for immediate ranges that you can redefine them
> at any moment by introducing
> new kinds of immediates , without need to even touch VM: language can
> easily control that.
> And the potential number of immediate classes is much bigger , if you
> encode them in ranges, because:
>
> 2^31 (smallint) + 2^32 (short float) + 2^24 (character unicode)  =
> 6459228160
>
> which leaves us:
>
> 2^63 - 6459228160 =  9223372030395547648
> space for other kinds of immediates, which we can introduce later
> !!without!! need to ever bother VM again.
>
> --
> Best regards,
> Igor Stasenko.
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131115/92f412cb/attachment-0001.htm


More information about the Vm-dev mailing list