iso8859-1

danielv at netvision.net.il danielv at netvision.net.il
Thu Nov 14 22:14:07 UTC 2002


Yes, and bidi support (for hebrew, for example) in Squeak would be great
too - it's a formidable project, though.

Daniel

Boris Gaertner <Boris.Gaertner at gmx.net> wrote:
> This is a multi-part message in MIME format.
> 
> --Boundary_(ID_AHeySs/4Z4DapYjiA+4HMQ)
> Content-type: text/plain; charset=iso-8859-1
> Content-transfer-encoding: 8BIT
> 
> 
> From: Ned Konz <ned at bike-nomad.com>
> 
> >On Thursday 14 November 2002 12:31 am, jean-marie.zajac wrote:
> >> > Objet: iso8859-1 Scamper
> >> >
> >> >
> >> > I am sure that someone has already fix this @@§&é$&^ù@#@ç@
> >> > problem with Scamper: HTML conversion charset from iso8859-1.
> >> > I don't find  any description of this problem in the web. Maybe
> >> > european users are only interested ?
> 
> I am very interssted in this problem, but I think it is a major project
> (see below).
> 
> >Can you give an example of a URL where you see this problem?
> A beautiful example is the page in the file example.zip.
> The encoding is windows-1252 and the page contains the characters
> LATIN SMALL LETTER ETH and LATIN SMALL LETTER THORN,
> which are not encoded in the Mac encoding that we use. The character
> LATIN SMALL LIGATURE AE , which is also used in that example,
>  is encoded in our fonts and should be displayed by Scamper.
> I downloaded that example some time ago from
> http://www.georgetown.edu/cball/oe/paternoster-oe.html
> and during the download (on a windows system) the encoding was changed
> to windows-1252. The page itself is encoded in ASCII - it does not use
> a character-encoding header and it does not use characters with encodings
> greater than 7F.
> 
> >I believe that Scamper is doing some translation (browse callers of
> >isoToSqueak) though it is (I think) ignoring the Character-Encoding
> >headers.
> Yes, Squeak ignores the Character-Encoding headers.
> The method isoToSqueak translates the encoding, but for some reason the
> untranslated string is displayed. The attached change set is an attempt to
> impprove this, but I am not convinced that it is a reliable solution. Can
> you
> please tell me whether it meets your needs ?
> 
> Programming an encoding-aware internet browser is a major project.
> A good browser supports more than 20 encodings, including encodings
> for scripts like chinese, hebrew and arabic. Squeak is currently not
> prepared to  display chinese ideograms. Squeak is also not prepared to
> display text that runs from right to left.
> 
> To display at least text written from left to right with characters of
> the latin, greek and cyrillic alphabets, is is necessary to do more or
> less this:
> 
> 1. We need glyphs for all these alphabets. The
> specification WGL4 (Windows glyph list 4) is a good point to
> start with. It contains 652 glyphs that form a paneuropean character
> set. (see: http://www.microsoft.com/typography/otspec/WGL4.htm)
> 
> 2. When scamper reads the encoding to be used, we have to
> create strike fonts for that encoding on the fly. This is not really
> difficult, we would simply copy glyphs from the WGL4 glyph set.
> The difficulty is to ged rid of these fonts when they are not
> needed any longer. Weak Arrays or weak dictionaries can be
> used to accomplish this.
> 
> A year ago, I began do implement something like that, but it is
> still not ready. Drawing 652 glyphs in four or five sizes and two
> styles (serif and san-serif) is an enormous amount of work.
> An then - I would like to have more than WGL4. It would be
> nice to have most of the glyphs of the first 24 Unicode pages.
> 
> 
> Tell me, is there any interest that kind of support for encodings?
> 
> Greetings, Boris
> 
> 
> 
> 
> 
> --Boundary_(ID_AHeySs/4Z4DapYjiA+4HMQ)
> Content-type: application/octet-stream; name=ISO8859.1.cs
> Content-transfer-encoding: quoted-printable
> Content-disposition: attachment; filename=ISO8859.1.cs
> 
> 'From Squeak3.4alpha of ''11 November 2002'' [latest update: #5109] on =
> 14 November 2002 at 9:38:36 pm'!=0D"Change Set:		ISO8859=0DDate:			14 =
> November 2002=0DAuthor:			Boris Gaertner=0D=0DThis is a proposal to =
> improve the display of those characters of ISO8895-1 that are also =
> encoded in the Mac font.=0DThe change is an improvement, but I am not =
> certain that it is also a fix. "!=0D=0D=0D!HtmlParser class methodsFor: =
> 'parsing' stamp: 'BG 11/14/2002 21:20'!=0DparseTokens: tokenStream=0D	|  =
> entityStack document head token matchesAnything entity body |=0D=0D	=
> entityStack _ OrderedCollection new.=0D=0D	"set up initial stack"=0D	=
> document _ HtmlDocument new.=0D	entityStack add: document.=0D	=0D	head _ =
> HtmlHead new.=0D	document addEntity: head.=0D	entityStack add: =
> head.=0D=0D=0D	"go through the tokens, one by one"=0D	[ token _ =
> tokenStream next.  token =3D nil ] whileFalse: [=0D		(token isTag and: [ =
> token isNegated ]) ifTrue: [=0D			"a negated token"=0D			(token name =
> ~=3D 'html' and: [ token name ~=3D 'body' ]) ifTrue: [=0D				"see if it =
> matches anything in the stack"=0D				matchesAnything _ (entityStack =
> detect: [ :e | e tagName =3D token name ] ifNone: [ nil ]) isNil not.=0D	=
> 			matchesAnything ifTrue: [=0D					"pop the stack until we find the =
> right one"=0D					[ entityStack last tagName ~=3D token name ] =
> whileTrue: [ entityStack removeLast ].=0D					entityStack removeLast.=0D	=
> 			]. ] ]=0D		ifFalse: [=0D			"not a negated token.  it makes its own =
> entity"=0D			token isComment ifTrue: [=0D				entity _ HtmlCommentEntity =
> new initializeWithText: token source.=0D			].=0D			token isText ifTrue: =
> [=0D				entity _ HtmlTextEntity new text: token source.=0D				=
> (((entityStack last shouldContain: entity) not) and: =0D					[ token =
> source isAllSeparators ]) ifTrue: [=0D					"blank text may never cause =
> the stack to back up"=0D					entity _ HtmlCommentEntity new =
> initializeWithText: token source ].=0D			].=0D			token isTag ifTrue: =
> [=0D				entity _ token entityFor.=0D				entity =3D nil ifTrue: [ entity =
> _ HtmlCommentEntity new initializeWithText: token source ] ].=0D			=
> (token name =3D 'body')=0D				ifTrue: [body ifNotNil: [document =
> removeEntity: body].=0D					body _ HtmlBody new initialize: token.=0D				=
> 	document addEntity: body.=0D					entityStack add: body].=0D=0D			entity =
> =3D nil ifTrue: [ self error: 'could not deal with this token' ].=0D=0D		=
> 	entity isComment ifTrue: [=0D				"just stick it anywhere"=0D				=
> entityStack last addEntity: entity ]=0D			ifFalse: [=0D				"only put it =
> in something that is valid"=0D				[ entityStack last mayContain: entity =
> ] =0D					whileFalse: [ entityStack removeLast ].=0D=0D				"if we have =
> left the head, create a body"					=0D				(entityStack size < 2 and: =
> [body isNil]) ifTrue: [=0D					body _ HtmlBody new.=0D					document =
> addEntity: body.=0D					entityStack add: body  ].=0D=0D				"add the =
> entity"=0D				entityStack last addEntity: entity.=0D				entityStack =
> addLast: entity.=0D			].=0D		]].=0D=0D	body =3D=3D nil ifTrue: [=0D		=
> "add an empty body"=0D		body _ HtmlBody new.=0D		document addEntity: =
> body ].=0D=0D	document parsingFinished.=0D=0D	^document! !=0D=0D=
> 
> --Boundary_(ID_AHeySs/4Z4DapYjiA+4HMQ)
> Content-type: application/octet-stream; name=example.zip
> Content-transfer-encoding: base64
> Content-disposition: attachment; filename=example.zip
> 
> UEsDBBQAAAAIABNfqCw0nMPPSwAAAE0AAAA6AAAAaHRtbGRlbW8vVGhlIExvcmQncyBQcmF5ZXIg
> aW4gT2xkIEVuZ2xpc2gtRGF0ZWllbi9TUEtSLkdJRnP3dLOwTBRk4GaY+J/h////DAwMBw4cAJKK
> P1kYgRSDDogAyTMwybX0rxS/kX0oecPe2UvmzmB66ztfrZX7h+uFGWtXPl/HymANAFBLAwQKAAAA
> AAATX6gsqp94mpYJAACWCQAAPAAAAGh0bWxkZW1vL1RoZSBMb3JkJ3MgUHJheWVyIGluIE9sZCBF
> bmdsaXNoLURhdGVpZW4vT0UtUEdTLkdJRkdJRjg5YTYAKwD3/wD////+/v79/f38/Pz7+/v6+vr5
> +fn4+Pj39/f29vb19fX09PTz8/Py8vLx8fHw8PDv7+/u7u7t7e3s7Ozr6+vq6urp6eno6Ojn5+fm
> 5ubl5eXk5OTj4+Pi4uLh4eHg4ODf39/e3t7d3d3c3Nzb29va2trZ2dnY2NjX19fW1tbV1dXU1NTT
> 09PS0tLR0dHQ0NDPz8/Ozs7Nzc3MzMzLy8vKysrJycnIyMjHx8fGxsbFxcXExMTDw8PCwsLBwcHA
> wMC/v7++vr69vb28vLy7u7u6urq5ubm4uLi3t7e2tra1tbW0tLSzs7OysrKxsbGwsLCvr6+urq6t
> ra2srKyrq6uqqqqpqamoqKinp6empqalpaWkpKSjo6OioqKhoaGgoKCfn5+enp6dnZ2cnJybm5ua
> mpqZmZmYmJiXl5eWlpaVlZWUlJSTk5OSkpKRkZGQkJCPj4+Ojo6NjY2MjIyLi4uKioqJiYmIiIiH
> h4eGhoaFhYWEhISDg4OCgoKBgYGAgIB/f39+fn59fX18fHx7e3t6enp5eXl4eHh3d3d2dnZ1dXV0
> dHRzc3NycnJxcXFwcHBvb29ubm5tbW1sbGxra2tqamppaWloaGhnZ2dmZmZlZWVkZGRjY2NiYmJh
> YWFgYGBfX19eXl5dXV1cXFxbW1taWlpZWVlYWFhXV1dWVlZVVVVUVFRTU1NSUlJRUVFQUFBOTk5N
> TU1MTExLS0tKSkpJSUlISEhHR0dGRkZFRUVERERDQ0NCQkJBQUFAQEA+Pj49PT08PDw7Ozs6Ojo5
> OTk4ODg3Nzc2NjY1NTU0NDQzMzMyMjIxMTEwMDAvLy8uLi4tLS0sLCwrKysqKiooKCgnJycmJiYl
> JSUkJCQjIyMiIiIhISEgICAfHx8eHh4dHR0cHBwbGxsaGhoZGRkYGBgXFxcWFhYVFRUUFBQTExMS
> EhIREREQEBAPDw8ODg4NDQ0MDAwLCwsKCgoICAgHBwcGBgYFBQUEBAQDAwMCAgIBAQEAAADAwMAA
> AAAAAAAAAAAh+QQBAAD8ACwAAAAANgArAEAI/wAB4IBBsKDBgwgTKlx4EAcGBFAwAYByCxEgIAAA
> 4KFHrxoUNHiwAABDDw0AICggQDkJAEM1UBhh4BHIsR2CjEAAwQGEAhEoFAiQtAOUEQCkdjgyIkpa
> NCOeWxgwOIOCoZtLj1gkZsQCAQAWVDwB3DIGIyMEephABG03FQwYNDhuwjCGBhIkFO1A3cTCFEM7
> eiKLzgTEEQ2GrBiaAoCAsl3atdVuQkGxGAYGNKg4QvjVDgoSY10zGqNHNCMKVBDAdANxqxoSOCYz
> 4gDUFYJjtUi6oYTTjSM9Y3iQYAAxHAokZ/TaoYECAgeicligYHGLxlk5QNGxIILE2Te9bpAQOf97
> DKUsECRIzpcFgAJQOY4YNU6fnBHVxN7O4CQ+yfH6/sVIoAHGSJgAhcR78WUEAhJgUAYELDwRBgYO
> 3WACARI14YBCORrR48xixiBCWUZQhLYYJvRAohYUiABATzlIYAEEDqAgIVthk8FRFCa34GERLCYC
> khgU9IDSEhx43CTQe6jAICJEsDRFUoobQbUYInio6IyJQpl4XkaqwQDFaCAAAMgtZRbVk4HloAFE
> i02BAAcqt9wCi3RKwgAEEN6Vsx6YcEDwnoeY/KnRTrcx1s14sIF0y3vtlBNppKhUCoqbyCWXXDvV
> AAIGLJtKKulf3aACx6lwcPYkhp7hAUWDIGD/gkWa7IGBCYcAwICJM5CAAQcSgCyVEagrwUEZAhgg
> AQsiMJyKBArVqAjBdIslCEJIwLEH2EgmudQOAGhUY6ZyAIh3XpEZgVIOlkp6teBdQSFooqkZoWFj
> RoTdohEIs0JCDxiegWAcPbegAQECYLSLRmyJEaYiRL+AAawxkCh5yy8YAJIUFm4mCwUcNyEAskD3
> tnQTIDfh0Y2SoBgDSHqAwCtUN4bBMSGXvn0EFG2i9dcuEmUhAkI19MBybUpQBOtQZg+P2S4MAQIB
> AxiolANJt2gYE5hS9LSLwMIDwtEOaxxppRgWMreztb/dQJDkYs4gIiMSIJTDFABw0DMiAEjg/4AG
> PTiwZgxlH/lcFCCg4PaeM9oxZywM1SDyMQ5I3PIbHigkvBFHe4MQN3pYWF52e3gAgTBH0iLhTJkI
> YLG1YrDjAaOgL4LQGyJvogBHgk0h4KmSWQIloFvEgzFd8cgnr7zyxxff/Fs4MAYCCigYhAII2GeP
> PfXUw3C99tkT9D342Vdf0PUoxEVRmgjkCYp3BCsJQWqlgppZ/BmF975vVhvaUjXkAcVSZgQDCAgr
> IxhABJAAAIrSHAkUzgBDbVjUjRZB6y5FmRbAgGA87QCFIggAAhbg4JCWHIViTJFVh4wGAASMCAkv
> 4dsvhAQBHknKIRAAgZ7iAkCgjLCFIABCV/+G8wuOJAUQNkGYCP3FHKLAwhnJAhkWkEOZKaFHQUjo
> ik/Uoh0kfERJydLITX4RJQh0AxbCeRMARuM6GKABAYigh45QhAablSmLRXkVKtL2rwQhwVWLIQoG
> 8kaPmSDFgO+pRsZQgYDN2QgGtAJXmsT0Pj7SCollGqRi+JRFTLQDBXxCAXc6JJEF5SomaACEIouC
> IXhBARQQOAxyAgMDeqAiPokRGQxukUMQQABIBtNJnAYEAjmahiO3+NV4kIYyILgKCFDAARbKwbpq
> mAQQwUkKEOzztYnoSDY6QoAz4pNMBZEQERLkCUSqAYRfiCQ1KioKqF4EA7QxpZa+GZBZbIT/gftJ
> zI2l6woYaHOLxxyoHYiAAybatZgiyhFDZcHCvf6GFim5DTAo+AUSEPEfBMjkUyli00oQxjHYOINi
> biyHRCDwk6ZYbm/gMgmKUhSdt8EOEz+BiJGaAgVJ+YaduXqjrpqCBXqYCAMrkQkGwFBMD0Him01p
> j4FWUxSHXmihLcEEGjaEgraxsh1cAsJlzISCJEEiKURaWVHQZqAUZaSRW8VBedYIOE9iwS97e1/J
> sAAInUSPaH6aCCJ+oa+iFJRN9BAOFlDgulyVFRUYw4MbbzG1m6CAI79oCpLwAAswmA4LvaEMElxY
> HodNVXJQiE9Q5lk2PoEGBEwFA9FQ1xQccyQHht6phkhCuBLTrmV1DIJd7xRDm/uR5lIIFAlPrsdQ
> vqHhWiGFiNkQYAzKCnetDHJNh1ZijHJEjl/+awkYkAOJ0ICAJ9PCBCYgYSlQVAoW67ULJEABC1S4
> FxXrxSko9svfSqECFvyFRX3tu9/31peEAQEAO1BLAwQUAAAACAAlX6gslCENRiUHAAD5FwAALQAA
> AGh0bWxkZW1vL1RoZSBMb3JkJ3MgUHJheWVyIGluIE9sZCBFbmdsaXNoLmh0bdWY3W7buBLH7wv0
> HWa9F22ASLLiOF+1vevYbhMcOwlsb3sWRbGgJVoiQpFakorXvTjvsy/W6z5BsUP6S47Pnq1PrmLA
> tr44HP5m+OdQjR+6t53xr3c9uBoP+nD3y2X/ugMVLwg+1DpB0B13FzeO/SqMFRGaGSYF4UHQu6m0
> Xr5o/OB5oMkDjWGqZAaF4s3X1Wr9+CA1Jr8Igtls5idUqoQaORM+jYsgmhDOA0mDnBiqhNT460nq
> pybj4HnWqu2z1bjqtbutxvh63O+1ximFvlTxKw13isypAibglsfQEwlnOm0Ei+ew8aA3bkMkhaHC
> NCuG/mECa/oNRClRmprmjIlYzrQXHtWPKmAd9ejvBXtodhaNvPE8p7uWBiOHou5Xq34trNb8sFqt
> VkCQjDbf9W56w/b4dthqBM5tbH152/0VJklHcqmaP07dB6w7zR+r7mMfGrcv+z2Y4MioalYhopyP
> chIxkTSP6vgAQGNsDS0Ph+7fHnXhoc1ZIppG5suLePkq/A5S+NCAGJPSGZxcnHthzYeNAdvD+uxu
> Y9k5uj7dcmt1YVg6cy623n7BYWFSUPhWwDcKlCgDUkBK5VSKInsDjTaUGwGkik6ble/Inv5d6M/I
> Q6XVuB68A8JNc3nlkcE125SyJDXNMHz0gFZRs/JP1LwuJiujItD5vfITNq1sWZmx2KTN8BTD38bv
> uLuDovH29mYMmn2mTS9sfayf33+CtwSDoEAWCkwqC/whBiwi7HvTPqU4v4RuBNbCwjj+DP8B/YjB
> NzSD2UkgoSnhiYyfRPtoh/bRs6F9fIy0JxQBz918BeTB5YzGe0I1Eq1EBRqwbBWL6JOQ1naQ1p4N
> 0topIo1ktoB6j4IVy2xPnAmdFQpl4Su6OmOckyfRPN6hefx8ErTmEtSLpVgAtTj2pIm6imy+EgF6
> Rty3pLT+k9DWd9DWn4/SWrSWDQprCkTb7v4/SS0UBieh8ZcknZGMswhRp5xMQc85hUKDkXjvSaBP
> dkCfPBvQJ2cWNK5lMWF8DhNFSQwJe1ihIfM9eROBVSWSmk+tBVtFJHNuMIJPIXy6Q/j0+aRyHQmv
> qCzBWuCa7Z3LK4mY0SVi8hUJF5lDTEWMR0+hfLZD+ez55LGjrFdoLOdtr7BW09SbpdJLUUY8pC9o
> 7JGEYBiMV+wbChtQJyz8S2zrZBtUgUurNqIQydMqjPOdOJw/n2w/XWa7kAa4FRNXJBdWwbEQK7c2
> NMsNsVvTfdlHiGWuHfMpzKcUhVzLb/yplV1Y3d2bVJ/PDLCbk0lhIKYc019ZPG57Tx8Y90u2jSr4
> 3N9hjj9uc4j/W7tG3E8u9tNMFBJtKhrhqLB0vCjD3gOzh0cL0JtjO0h4XQ/P7g+gtJV9vLEdp0wD
> jk1j0tjYm13Gy0F3SDZRLE7oIXSkytHvTqqYNmztMW7xOcWZGh5X4eOHUSf/dAgE4RChucvKVQfv
> pM4pR2lRzKDK2jBe4h7Q/tvbK3tTprSxe5TpqmEY4lMRFaZQ8zdAY2ZoDJM59Fnx+TOB1+H5+fGB
> D0M7S/Dy3jTdr3sDU2l1iC1AL4ktQBHlytDrLs2xhMrQCetVH8NWWAqRPoR3a5vwi2COqpkfWPnc
> dSXD6sDIiwciqGWATsyIin/GTJCC+II8zP2MoRs9dx3eL5+zvmBsRoXAQmJtchSlUnKIONG69Baj
> 9N7iajPpG5etkSzs8sk4RZW+bKHGY1WxSEMEioEiMLBvX4zUKdxRhSPA/fNJGNY7XZwFOOZN19bU
> ACeeDyOS5dzeU3gKR0f+Ub0O91ef38CZN2HmED6034MzZbacRIfs1M5x5IWI2FLA0C1MvQnRC48+
> ZnPAnpCpQTG0vcjpJ+jjgBHHOjguAdxIcA1SRbTKupJKQJ5KIblM5k5VE0XylGbUM9Kzd/BwbQ55
> KKpzid2KiCLZayCZbWHotOBYzP1thmV2ohgfU0HLRYL9J8axERG4iGKid92pDech/IsoDlckwalg
> XVpZvCPYSTu6t+9GptC+GbX/7fVdNtkVAHXSrZA4Tqo1rtSaJcKlZem1SeO6tV2pN4JrlIVXrjZ9
> dfA4DEM6xUywY3X8Szct6EMYyoQqHyxl39ouYb2ANlzhRJAKu+GleVGShywnwsX2uuVv9ORic1ia
> Nig/1CZz43LYWkxv7N6HgX+4Nkjjkis2g8pRfr+taUvJWXTd641HUKtiw9s/EGd8sTa5OP9vbgR3
> /1PZFxMM0+G7ZVvy+De6cHapOMtV0r7N5HRq0Kn9lj1JvTzRbuHDSKJGCpuiZSgvX9xhlmmbdTik
> rdUOo2OTR/8EI6x7bcONAC7XpS3hcuP4OSl0IR4NsdL6+3tLKbVB/YgpZaDIY2JFfIBqFp4d2nie
> fVqupC72214OV0ulxsmZz5WtFuBPtIjtTpzHVDGsIm/8pXIvDGGPNnz2u4ybe7X+8sVfUEsBAjIL
> FAAAAAgAE1+oLDScw89LAAAATQAAADoAAAAAAAAAAAAgALaBAAAAAGh0bWxkZW1vL1RoZSBMb3Jk
> J3MgUHJheWVyIGluIE9sZCBFbmdsaXNoLURhdGVpZW4vU1BLUi5HSUZQSwECMgsKAAAAAAATX6gs
> qp94mpYJAACWCQAAPAAAAAAAAAAAACAAtoGjAAAAaHRtbGRlbW8vVGhlIExvcmQncyBQcmF5ZXIg
> aW4gT2xkIEVuZ2xpc2gtRGF0ZWllbi9PRS1QR1MuR0lGUEsBAjILFAAAAAgAJV+oLJQhDUYlBwAA
> +RcAAC0AAAAAAAAAAQAgALaBkwoAAGh0bWxkZW1vL1RoZSBMb3JkJ3MgUHJheWVyIGluIE9sZCBF
> bmdsaXNoLmh0bVBLBQYAAAAAAwADAC0BAAADEgAAAAA=
> 
> --Boundary_(ID_AHeySs/4Z4DapYjiA+4HMQ)--



More information about the Squeak-dev mailing list