[squeak-dev] Re: MC should really write snaphsot/source.st in UTF8

Bert Freudenberg bert at freudenbergs.de
Thu May 23 16:57:11 UTC 2013


On 2013-05-23, at 08:47, Tobias Pape <Das.Linux at gmx.de> wrote:

> At least it does not switch encodings mid-stream, this would be the absolute nightmare...

It did, at one point, but we fixed that. Can't remember the details though.

- Bert -

> Am 23.05.2013 um 01:00 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> 
>> Yes, this is the easy part, I was speaking of how the hell we can write a 32bits Word-oriented collection into a Byte-oriented stream and magically end up with ut-32be ;)
>> 
>> 
>> 2013/5/23 Tobias Pape <Das.Linux at gmx.de>
>> 
>> 
>> Am 23.05.2013 um 00:11 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>> 
>>> Yes, it's UTF-32BE, see SO post. And you get a bonus point if you can find by what magic this happens without tracing in a Debugger ;)
>> 
>> That one might be easy, as it bit me several times.
>> First, the stream that is uses for sources.st is backed by a simple ByteString, but
>> during the writing of the definitions, comments, etc, once you hit a method with a wide
>> Character, its source is a WideString, and putting that onto the stream makes
>> the underlying string #become a WideString, too. Which will then be written us g utf32be (as i just learned).
>> 
>>> 
>>> 
>>> 2013/5/23 Tobias Pape <Das.Linux at gmx.de>
>>> Fun fact:
>>> Having one “wide” character somewhere in one method or comment and 
>>> Filing out an mcz, the sources.st becomes a persisted WideString (utf16?)
>>> And you won't know it...
>>> 
>>> Am 22.05.2013 um 23:16 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>> 
>>>> First thing would be to simplify #setConverterForCode and #selectTextConverterForCode.
>>>> Do we still want to use a MacRomanTextConverter, seriously? I'm not even sure I've got that many files with that encoding on my Mac-OSX...
>>>> Do we really need to put a ByteOrderMark for UTF-8, seriously? See http://en.wikipedia.org/wiki/Byte_order_mark, it's valueless, and not recommended. It were a Squeak way to specify that a Squeak source file would use UTF-8 rather than MacRoman, but now this should be obsolescent.
>>>> 
>>>> 
>>>> 2013/5/22 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
>>>> http://stackoverflow.com/questions/16645848/squeak-monticello-character-encoding
>>>> Let's kill this one, it's totally insane
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20130523/c0aed375/attachment.htm


More information about the Squeak-dev mailing list