[squeak-dev] The Trunk: Collections-eem.792.mcz

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri May 4 21:36:46 UTC 2018


2018-05-04 22:10 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:

>
>
> On Fri, May 4, 2018 at 12:44 PM, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
>
>>
>>
>> 2018-05-04 0:50 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>
>>> Hi Tobias, Hi All,
>>>
>>>
>>> > On May 3, 2018, at 3:08 PM, Levente Uzonyi <leves at caesar.elte.hu>
>>> wrote:
>>> >
>>> >> On Thu, 3 May 2018, Tobias Pape wrote:
>>> >>
>>> >>
>>> >>> On 03.05.2018, at 22:48, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>> >>> But WideString requires another hack...
>>> >>
>>> >> Like
>>> >>
>>> >>    ^false
>>> >>
>>> >> ? :D
>>> >
>>> > Not really: ((WideString new: 2) first: 1) isAsciiString
>>> >
>>> > Levente
>>>
>>> Note that this is a common issue in Smalltalk, where we can have
>>> different implementations (classes) with the same interface.  Take
>>> LargeInteger and SmallInteger.  The arithmetic system and the VM are both
>>> implemented to almost never represent something in the SmallInteger range
>>> as a LargeInteger (there are rare circumstances but it's safe to assume
>>> that the invariant is always maintained, and the invariant is depended
>>> upon).  This allows the VM to only ever check for SmallIntegers for things
>>> like indices, never having to waste code bloat or cycles checking for
>>> denormalised LargeIntegers.
>>>
>>> Why can we do this with SmallInteger & LargeInteger, but not with
>>> ByteString and WideString?  Because ByteString and WideString are mutable
>>> (and because of the FFI).  Were the system to maintain the invariant that
>>> strings containing characters in the range 0 to 255 were always represented
>>> by ByteString, then, Tobias, your WideString>>isAsciiString ^false would
>>> work.  But the cost of maintaining that invariant would be scanning the ret
>>> of the string every time at:put: deposited a byte character, to see if we
>>> had just replaced the last wide character by a byte one and hence needed to
>>> do a become:.  We'd also potentially spend a lot of time doing becomes, and
>>> we'd also have to allow for denormalisation when passing an ascii string
>>> through the FFI to code requiring a wide string.  And even worse we'd have
>>> to avoid WideString new: n like the plague since new strings are always
>>> ascii, being full of nuls.  So only WideString with:... forms would make
>>> sense.
>>>
>>> In such case we would maintain a count of non-byte characters and avoid
>> scanning...
>>
>
> Only possible if the representation makes room for a count, which could
> easily require more than 24 bits.  It is non-trivialto implement, and of
> course slows down access.
>
>
For example, just allocate one more character code and use it to store the
count...
Most could be done at image side.



> So this kind of multiple implementation approach only works well with
>>> certain types and access patterns.  Interesting, no?
>>>
>>
> _,,,^..^,,,_
> best, Eliot
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180504/a3d5374b/attachment.html>


More information about the Squeak-dev mailing list