iso8859-1

Boris Gaertner Boris.Gaertner at gmx.net
Thu Nov 14 21:59:43 UTC 2002


From: Ned Konz <ned at bike-nomad.com>

>On Thursday 14 November 2002 12:31 am, jean-marie.zajac wrote:
>> > Objet: iso8859-1 Scamper
>> >
>> >
>> > I am sure that someone has already fix this @@§&é$&^ù@#@ç@
>> > problem with Scamper: HTML conversion charset from iso8859-1.
>> > I don't find  any description of this problem in the web. Maybe
>> > european users are only interested ?

I am very interssted in this problem, but I think it is a major project
(see below).

>Can you give an example of a URL where you see this problem?
A beautiful example is the page in the file example.zip.
The encoding is windows-1252 and the page contains the characters
LATIN SMALL LETTER ETH and LATIN SMALL LETTER THORN,
which are not encoded in the Mac encoding that we use. The character
LATIN SMALL LIGATURE AE , which is also used in that example,
 is encoded in our fonts and should be displayed by Scamper.
I downloaded that example some time ago from
http://www.georgetown.edu/cball/oe/paternoster-oe.html
and during the download (on a windows system) the encoding was changed
to windows-1252. The page itself is encoded in ASCII - it does not use
a character-encoding header and it does not use characters with encodings
greater than 7F.

>I believe that Scamper is doing some translation (browse callers of
>isoToSqueak) though it is (I think) ignoring the Character-Encoding
>headers.
Yes, Squeak ignores the Character-Encoding headers.
The method isoToSqueak translates the encoding, but for some reason the
untranslated string is displayed. The attached change set is an attempt to
impprove this, but I am not convinced that it is a reliable solution. Can
you
please tell me whether it meets your needs ?

Programming an encoding-aware internet browser is a major project.
A good browser supports more than 20 encodings, including encodings
for scripts like chinese, hebrew and arabic. Squeak is currently not
prepared to  display chinese ideograms. Squeak is also not prepared to
display text that runs from right to left.

To display at least text written from left to right with characters of
the latin, greek and cyrillic alphabets, is is necessary to do more or
less this:

1. We need glyphs for all these alphabets. The
specification WGL4 (Windows glyph list 4) is a good point to
start with. It contains 652 glyphs that form a paneuropean character
set. (see: http://www.microsoft.com/typography/otspec/WGL4.htm)

2. When scamper reads the encoding to be used, we have to
create strike fonts for that encoding on the fly. This is not really
difficult, we would simply copy glyphs from the WGL4 glyph set.
The difficulty is to ged rid of these fonts when they are not
needed any longer. Weak Arrays or weak dictionaries can be
used to accomplish this.

A year ago, I began do implement something like that, but it is
still not ready. Drawing 652 glyphs in four or five sizes and two
styles (serif and san-serif) is an enormous amount of work.
An then - I would like to have more than WGL4. It would be
nice to have most of the glyphs of the first 24 Unicode pages.


Tell me, is there any interest that kind of support for encodings?

Greetings, Boris




-------------- next part --------------
A non-text attachment was scrubbed...
Name: ISO8859.1.cs
Type: application/octet-stream
Size: 2913 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20021114/2a90e5a8/ISO8859.1.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.zip
Type: application/octet-stream
Size: 4934 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20021114/2a90e5a8/example.obj


More information about the Squeak-dev mailing list