The Mosner bit

Stephen Pair spair at
Fri Sep 1 15:00:10 UTC 2000

This is very interesting!

Has this been tried in other implementations before?

Does this essentially mean that up to 128 immediate classes can be added,
each with up to 16M unique instances?

Since Squeak uses direct pointers and addresses are typically aligned on 4
byte boundaries, it seems like that bit is being wasted anyway, except for
the SmallInteger case.  It doesn't seem like it should be that big of an
impact on performance either (but maybe so, have any benchmarks been run?).

With regard to things such as Unicode (and other character sets), if you
tried to create a string out of OOPS (in some sort of collection for
instance), you would be using up twice as much space as the character set
(in the case of Unicode anyway) required.  But, alternatively, you could
have a variable word class that efficiently stores the Unicode (i.e.
UnicodeString)...when accessing a single character in the string, it would
be a simple process to tack on the extra two (well-known) bytes that would
form the immediate OOP for the corresponding Unicode character.

The ability to efficiently represent just about any encoding scheme is
reason enough in my opinion.

I think it's definitely worth exploring!  You have my vote.

- Stephen

> -----Original Message-----
> From: Mats Nygren [mailto:nygren at]
> Sent: Friday, September 01, 2000 7:35 AM
> To: squeak at
> Cc: nygren at; hm.mosner at; hmm at
> Subject: The Mosner bit
> Hi,
> I found on the swiki reference to Hans-Martin Mosner's extra bit for
> tagging object pointers.
> This must have been discussed in the past. I wish to renew that
> discussion as I think it is an exceptionally good idea. Perhaps H-M M
> himself will give his current position on this. The following is what
> I make of it.
> Description:
> The details can be elaborated in different ways here's one (showing
> the two least significant bits):
> 10 - small integer
> 00 - pointer
> 01 - special i
> 11 - special ii
> Small integers works as previously but with one bit less. This will
> probably cause some solvable problems.
> The interesting part is special i/ii, I propose that they are used as
> follows:
> byte3 byte2 byte1 - together forms 24 bits giving 16M values.
> byte0 - bit 0 is constant = 1, 7 bits left gives 128 values used as
> tags.
> So we have 128 tags, if used wisely that is a huge possibility.
> Here are some possibilities:
> Characters
>   ascii/lf, ascii/cr, ascii/crlf, iso-xxxx-1, utf-8, utf-16, ..,
> home-brew-1, ..
>   consider for example one tag meaning the present character set
> (ascii/cr) with
> extra info for font, style, size, color.
> Standard Classes -
>   a well chosen set of essential classes, can be easily accessed
>   and communicated. This should include Object, Symbol and many
>   such and also the ParseNode-hierarchy and similar.
> Ansi (and other well established) protocols -
>   all standard interfaces can be cataloged in this form and
>   easily communicated
> bytecodes -
>   the normal byte code set can be considered numeric
>   code/symbol at the same time
> widget family
> primitive methods
> special (simple) methods (projections, many others)
> a nomenclature for (C-like) types
> tightly packed structures
> html tags
> Prolog-like variables and other things with special "roles in the system"
> other important (closed) coding systems, midi, vrml
> many other possibilities exist
> Mosner gives an example that in this version would give 12-bit
> coordinates for Point.
> This can be considered universal (cross image) pointers. Things that
> are lifted above gc (global tenure). It is a good help in communicating
> with plugins, providing a rich language independent of gc.
> Some of the above I have experience with. No doubt others will find
> intereseting uses of the idea if it gets available.
> The above is a bit cryptic and very incomplete, bottom line:
> This is a good thing, lets reimplement it. If Hans-Martin Mosner
> will do it
> good, if not I will. If it is wanted that is. An immediate gain
> is for different
> character sets including large ones, 24M no problem, the current way of
> handling characters doesn't scale.
> Note also that this idea will be even better on 64-bit machines that will
> appear sooner or later. One would then have an immense set of interesting
> values liberated from the burden of gc.
> /Mats

More information about the Squeak-dev mailing list