Adding Accufonts to the update stream (was Re: LicencesQuestion : Squeak-L Art 6.)

Tue Feb 25 22:42:21 UTC 2003

I wrote:
    One problem with moving to ISO-8859-1 is that you lose several
    pleasant characters that are useful for writing English, notably
    left and right single and double quotation marks and a couple of dashes.

Todd Blanchard (tblanchard at mac.com) retorted:
	These characters are not at all pleasant unless you happen to run 
	Windows.

Wrong.  I not only _don't_ run Windows, none of the machines in my office
_could_ run Windows.  (This is about to change; I'm going to get one box
with SPARC Solaris, Intel Windows, and Intel Linux in it.  This will happen,
well, a couple of months ago, actually, but Real Soon Now.)

The characters I mention are ALL in the MacRoman character set that
Squeak uses right now and has for ages.  It doesn't matter whether Squeak
was running on a Windows box, on a Mac box, on a SPARC box, on an ARM box,
or whatever, people using *Squeak* had those characters on all platforms.

Some of these characters DO occur in the Squeak sources.

	The show up as garbage on more web pages and produce parsing
        errors in many programs including XML and C parsers.

They show up as garbage in web pages mainly because the original fails
to correctly identity the character encoding.

As for XML, no XML parser has the slightest excuse for complaining about
these characters if there is an XML declaration which correctly identifies
the encoding used, AS THERE SHOULD BE.

As for C, a C compiler has not the slightest excuse for producing a
"parse error" if any of these characters occurs in a character literal,
string literal, or comment.  In a character literal or string literal,
it might report that the characters are not *portable*, but there is no
excuse for a parsing error.

	They are not pleasant at all.

The characters in and of themselves are perfectly pleasant; they are
important parts of the English writing system.

What is unpleasant is HTML and XML documents either failing to identity
the character encoding at all or identifying it incorrectly.  (Windows
programs claiming to emit "Latin-1" are particularly at fault here.)
That is not the fault of the characters.  (The default character
encoding for XML is UTF-8, if I recall correctly, so anyone wanting to
use non-ASCII 8-bit characters had d--n well BETTER identity the
character encoding correctly.)

Nor is it at all clear to me why making Squeak yet another of the
programs that cannot handle these now common characters would serve
anyone's ends.

	That said - the ISO-8859-X line of character encodings is rightfully 
	considered to be a family of legacy encodings.  What would be ideal 
	would be to support UTF-8 in the file system and whatever we like 
	internally.  UTF-8 has the nice property of being binary compatible 
	with most things that deal with ascii.

It is difficult to support UTF-8 _anywhere_ correctly without supporting
21-bit characters practically _everywhere_ except for display.

The topic of moving away from MacRoman is a hardy perennial;
I've asked for it a couple of times myself.
The question is "What is the simplest thing that could possibly work?"
What MIGHT ACTUALLY HAPPEN?

If we ask for too much in one big gulp, it ain't gunna happen.
If we ask for what's doable, we might just get it.

A move to Latin-1 is at least a move to the bottom chunk of Unicode.

My concern was simply to point out that something WILL be lost by the
move, and that we don't actually need to lose quite that much.

	FWIW, if we are looking for a library to take on internationalization - 
	may I recommend IBM's International Components for Unicode - open 
	source library.

At the moment, most of the writers in this thread are NOT looking for a
library to take on internationalisation, just better fonts.  Switching to
another 8-bit character set seems to be primarily a means of enlarging the
set of free fonts we might be able to use.

As for the ICU, I have them.  Ok if you are happy with C++, I guess, but
if I were, I wouldn't be Squeaking.