MacRoman, Latin1, squeak fonts, and non breaking spaces.
Bert Freudenberg
bert at impara.de
Mon Apr 10 09:35:20 UTC 2006
Am 10.04.2006 um 02:19 schrieb Peace Jerome:
> Hi Bert and other concerned folk.
>
> In reading Bert's post about fixing fonts to show the
> invisible characters I was reminded of tripping over
> the nonbreaking space (nbsp).
>
> See mantis report:
>
> http://bugs.impara.de/view.php?id=2446
>
>
> I use a Mac and MacRoman defines nbsp as char 202. And
> this can be gotten from Character nbsp.
Doesn't have anything to do with the host operating system. We
switched to Unicode, of which latin-1 (iso-8859-1) is the 8-bit
subset (nitpicking aside).
> In the default font in 7021 this appears as the
> British pound sign.
It should be Ê (E circumflex).
> There are some squeak fonts
> (atlantis for example) that will show a blank space
> for that character.
Only because Atlantis never had a glyph for "E circumflex". That's
why it was blank. That's why it's replaced with a rectangle with my
fix now.
> Now Bert's fix uses char 160. Which is used by
> browsers as nbsp but the Latin1 standard I was pointed
> to has 160 in a range of undefined character values.
Codepoints 128-159 do have a meaning but no glyphs in Unicode. 160 is
indeed the non-breaking space. It's "reserved" in that there is no
actual glyph associated with it, in that respect it's more like a
control character. However, for our particular implementation of
bitmap fonts it's convenient to just use a blank glyph.
See http://www.unicode.org/charts/PDF/U0080.pdf
> So the question is there is (at least one) bug in
> this. What is the bug?
>
> 1) Should nbsp be define as the latin1 value?
Yes.
> 2) Should squeak fonts have a way of saying what set
> of characters they represent?
I guess so.
> 3) Should the available fonts in squeak be consistent
> in choice of encodeing?
In an ideal world, yes. For practical reasons I think we have to deal
with whatever we get.
> 4) Should Character class be refactored to reflect the
> ability to choose different encodings?
No. Characters are not encoded, they represent Unicode values.
Or at least by default they are. We support some non-unicode 16-bit
encodings for asian languages, too, IIRC. Yoshiki would know best.
> 5) Should Character class be debugged to reflect
> Latin1 rather than MacRoman encodings?
Yes.
> If so what do you do about MacRoman?
Use the appropriate converter class.
> I have enough knowledge to know these questions are
> significant to the well being and maintenence of
> squeak. I am out of my depth in trying to suggest
> answers.
>
> It would be good it someone who understands the issue
> more deeply would formulate a mantis issue around it.
Sure. There's a whole lot still to do in that area.
- Bert -
More information about the Squeak-dev
mailing list
|