The Mosner bit

Mats Nygren nygren at sics.se
Sat Sep 2 10:14:34 UTC 2000


Tim,

A strong argument: If you don't do (something similar to) this how are
you going to deal with large character sets like Unicode? Or even
smaller character sets but several different. The fun of having a table
of all instances decreases with the number of such instances.

One reason I failed to get Squeak accepted for a language education
program in Uppsala was that they have immediate need for interesting
character sets. I couldn't guarantee the soon appearance of that. I'm
still working on it. It would produce 40 persons a year with special
skills in using computers for (natural) language processing, each
having extensive experience with Squeak. Giving me a nice teaching
work, and the Squeak community a supply of languages applications.
And the students a good time.

Tim Rowledge <tim at sumeru.stanford.edu> wrote:
> Bijan Parsia <bparsia at email.unc.edu> is widely believed to have written:
> 
> > Cool!
> > 
> > I had been wondering for a while if the extended tag scheme would
> > work. Any VM implementers/object memory specialists want to weigh in?
> VW has used two tag bits for years. IIRC it uses three of the patterns:-
> smallinteger
> oop
> character
> 
> The cost is not just the reduction in range of SmallInteger - after all
> the difference is pretty small - but the complexity of the tag checking.
> at the moment, checking for SmallInteger/oop is a single test. Two tag
> bits makes deriving the class more complex (check for SI, check for
> other option, remainder is normal object) and affects any code that
> needs to understand the class of the objects involved.

If it is important to check with a single test then the following:

00 - oop
01 - small integer
11 - the new stuff

can be used. You still have a single test to know if something is an
oop. No cost for those. For smallint at least one instruction extra will
be needed, probably. I don't think that is untolerable. Sacrificing one
bit combination there are then only 64 new tags, but that is a lot.

Another possibility:

00 - oop
01 - immediate

single test and then let smallint have one of the 128 tags. Then there
are only 16M smallints. I considered this to be to small to be
acceptable but I could have been wrong of course. Since this is simplest
and most elegant I would prefer this, if the major decrease from 2G smallints
to 16M is tolerable. Probably there are places where this creates problems,
solvable problems.

> A restriction is that this is only really useful for manifest constant
> objects; yes, you could use the data bits to index a list of classes for
> example, but how is that an advantage over having the oop of the class?

There are in fact advantages:

- things sent over a network can be used immediately, depending on there
being the same list of classes in the other image. oop have local
significance only.
Compare with the current image segments that make up new tables of 
outpointers for each image segment and then have a completetly different
way of doing this when making segments for export to another image.
Having a fixed list as described here will make the difference between
own/export dissappear for those well known standard things that has
been given this extra status. Similar issues exist with ReferenceStream,
they can be made more efficient with the special tags. Can make a difference
for the current interest in communication between different Squeaks. Note
also that since calling plugins can be done in a similar way the difference
between a plugin call and a RMI decreases.

(1) The inter image significance of these things is one of the primary
motivations.

- you can build alternative lists:
{ButtonMorph, ListMorph, <X>Morph, ..}
{MvcButton, MvcList, Mvc<X>, ..}
with these kinds of things you can reinterpret major structures of
things immidiately without even traversing it. The list is applied at
the latest moment when the relevant "new tag" is accessed.

- another similar point:
if you store the names of classes (or other things) in a table on the
side, then it is possible to have another table with class names in
another language. The switch is immediate. And this scales well.

(2) This ability to swiftly switch tables and reinterpret major
structures without traversal is another primary motivation.

Imagine a graphical scene. With a bunch of the new type of codes in it.
By changing the tables referred to astonishing effects can be achieved.
Similary with documents and many things.

> Immediate Points, restricted range floats, colour values, anything where
> the bits is the data, would all be plausible. Remember that such
> manifest objects cannot be altered any more than a SmallInteger can, so
> quite a bit of code in the image would be affected; for example you wuld
> have to write
> pt1 := pt1 x @ (pt1 y *2)
> instead of
> pt1 y: (pt1 y * 2)
> ... which is a poor example since I think I would prefer it anyway, but
> you're smart enough to see what I mean.

I agree, this is for immutable value objects only. I don't agree that
any code in the image would be affacted. It will, only if one decides to
use the new stuff for changeable objects. Either don't use it for things
like that or clearly document that certain classes are immutable and
rewrite the code. 

Having Point's documentedly immutable can be a good thing but isnt
necessary; Point's can be left outside of this. "I think I would prefer
it anyway" could be your intuition saying: perhaps Points should be
considered immutable. (Whether they fit into a single word or not)

> So, yes it might be useful in some sense but I rather suspect the
> runtime costs are unpleasant, especially once you go past a simple VW
> like form.

the present:

bit0 = 0
  ifTrue:
    [normal oop]
  ifFalse:
    [Smallint]

first alternative:

bit0 = 0
  ifTrue:
    [normal oop]
  ifFalse:
    [bit1 = 0
      ifTrue: [Smallint]
      ifFalse: [Table at: (byte0 bitShift: -2)]]

second alternative:

bit0 = 0
  ifTrue:
    [normal oop]
  ifFalse:
    [Table at: (byte0 bitShift: -1)]

I see no unpleasant runtime costs here. The little extra time needed
should be seen in relation to the decrease of allocations in object
memory and the time taken in each and every gc. Note also that the
present Squeak object memory runs slower due to the complexity of
compact classes. A similar idea with similar effects. In fact the compact
class table would be part of one of the new tags.

I would like to hear Hans-Martin Mosner on this. Then a period of
experimentation. Then a decision.

/Mats





More information about the Squeak-dev mailing list