This is very interesting!
Has this been tried in other implementations before?
Does this essentially mean that up to 128 immediate classes can be added, each with up to 16M unique instances?
Since Squeak uses direct pointers and addresses are typically aligned on 4 byte boundaries, it seems like that bit is being wasted anyway, except for the SmallInteger case. It doesn't seem like it should be that big of an impact on performance either (but maybe so, have any benchmarks been run?).
With regard to things such as Unicode (and other character sets), if you tried to create a string out of OOPS (in some sort of collection for instance), you would be using up twice as much space as the character set (in the case of Unicode anyway) required. But, alternatively, you could have a variable word class that efficiently stores the Unicode (i.e. UnicodeString)...when accessing a single character in the string, it would be a simple process to tack on the extra two (well-known) bytes that would form the immediate OOP for the corresponding Unicode character.
The ability to efficiently represent just about any encoding scheme is reason enough in my opinion.
I think it's definitely worth exploring! You have my vote.
- Stephen
-----Original Message----- From: Mats Nygren [mailto:nygren@sics.se] Sent: Friday, September 01, 2000 7:35 AM To: squeak@cs.uiuc.edu Cc: nygren@sics.se; hm.mosner@cww.de; hmm@heeg.de Subject: The Mosner bit
Hi,
I found on the swiki reference to Hans-Martin Mosner's extra bit for tagging object pointers.
http://www.heeg.de/~hmm/squeak/2tagbits/
This must have been discussed in the past. I wish to renew that discussion as I think it is an exceptionally good idea. Perhaps H-M M himself will give his current position on this. The following is what I make of it.
Description:
The details can be elaborated in different ways here's one (showing the two least significant bits):
10 - small integer 00 - pointer 01 - special i 11 - special ii
Small integers works as previously but with one bit less. This will probably cause some solvable problems.
The interesting part is special i/ii, I propose that they are used as follows:
byte3 byte2 byte1 - together forms 24 bits giving 16M values. byte0 - bit 0 is constant = 1, 7 bits left gives 128 values used as tags.
So we have 128 tags, if used wisely that is a huge possibility.
Here are some possibilities:
Characters ascii/lf, ascii/cr, ascii/crlf, iso-xxxx-1, utf-8, utf-16, .., home-brew-1, .. consider for example one tag meaning the present character set (ascii/cr) with extra info for font, style, size, color.
Standard Classes - a well chosen set of essential classes, can be easily accessed and communicated. This should include Object, Symbol and many such and also the ParseNode-hierarchy and similar. Ansi (and other well established) protocols - all standard interfaces can be cataloged in this form and easily communicated bytecodes - the normal byte code set can be considered numeric code/symbol at the same time widget family primitive methods special (simple) methods (projections, many others) a nomenclature for (C-like) types tightly packed structures html tags Prolog-like variables and other things with special "roles in the system" other important (closed) coding systems, midi, vrml many other possibilities exist
Mosner gives an example that in this version would give 12-bit coordinates for Point.
This can be considered universal (cross image) pointers. Things that are lifted above gc (global tenure). It is a good help in communicating with plugins, providing a rich language independent of gc.
Some of the above I have experience with. No doubt others will find intereseting uses of the idea if it gets available.
The above is a bit cryptic and very incomplete, bottom line:
This is a good thing, lets reimplement it. If Hans-Martin Mosner will do it good, if not I will. If it is wanted that is. An immediate gain is for different character sets including large ones, 24M no problem, the current way of handling characters doesn't scale.
Note also that this idea will be even better on 64-bit machines that will appear sooner or later. One would then have an immense set of interesting values liberated from the burden of gc.
/Mats