64 bit images(was: A plan for 3.8/4.0...)

Yoshiki Ohshima Yoshiki.Ohshima at acm.org
Mon Apr 26 08:22:31 UTC 2004


  Hello,

> - Another very useful one would be Float (with a range of 61 bits, and
> primitives working with 64 bit floats. The mantissa should be completed with
> 3 zeroes on primitive entry, and have the last 3 bits truncated on primitive
> exit.) I believe this one would alse be useful in VI 4 32-bit image. I
> believe 32 bits pointers should be aligned by 32 bits, needing 30 bits, and
> leaving space for 2 extra inmediate objects. Floats should be one.
> Primitives would work with 32 bits floats, and the mantissa should lose 2
> bits (instead of 3).

  I agree with Prof. Richard O'Keefe's comment here.  Most of the
space/time consuming floats are in Arrays, so optimizing the boxed
representation, or the rare case, wouldn't give me too much good.

> The following are less clear, but worth considering:
> 
> - ShortSymbol. We could have some short symbols coded in the object pointer.
> This would allow to shrink the symbol dictionary, saving memory and making
> symbol creation faster. Anyway, the only performance improvements would be
> on symbol creation, mostly when compiling methods, but it could be useful
> anyway. ShortSymbols would only be allowed to be made of: A..Z, a..z, 0..9,
> and :. They would use only 6 bits per character, and they can be up to 10
> characters long. Many selectors could be ShortSymbols.

  It sounds like symbol creation will be slower if we pack 10 chars in
60 bits?

> - Character. I always found strange that a character would use more memory
> space than a SmallInteger. Perhaps a good inmediate character representation
> could make multi-lingual and multi-alphabet strings easier. I guess Yoshiki
> could think if this is a good idea. Perhaps in 61 bits we could also have
> space for coding some format information: bold, italic, outline, font, size,
> color, etc. Some of these bits would say to which class the character
> belongs (we could have a Character hierarchy). Others would be indexes to
> tables in the class (i.e. font). Perhaps ther would also be a LongCharacter
> or FullCharacter for those that have some property not covered by the
> inmediate representation.

  The language info, which is not representable in Unicode, is the
only thing that should go with characters and strings.  Other property
can be in TextAttributes.

  The Compiler today reads strings and produce objects.  The question
here is that we would want to get 'bold' string result, or even a
number result that would be rendered as 'bold' string when we ask.

  Or, if we concatenate two strings with different properties with the
',' message, what kind of result we'd like to get?

  Those questions affect the string/character encoding, and my
conclusion is that the only thing should go there is the language
info.

  In Squeak 5.0 or whatever, one approach we could go is to get rid of
all characters and to use strings with length = 1 (or smallest
meaningful length depending on the content) as today's characters.

-- Yoshiki



More information about the Squeak-dev mailing list