[squeak-dev] Re: MC should really write snaphsot/source.st in UTF8

Thu May 23 01:59:26 UTC 2013

On Wed, May 22, 2013 at 3:57 PM, Nicolas Cellier
<nicolas.cellier.aka.nice at gmail.com> wrote:
> MC never wrote a BOM, so we don't have to be compatible with BOM.
>
> If we can simplify the process, let's simplify, because maintaining useless
> compatibility costs, the code is really crooked by now, and this leads to
> mis-understanding, and soon to broken features and noise. Currently,
> snapshot/source.st IS broken.

For a long time, yes.

> If there are codes > 127, the UTF8TextConverter will most likely fail, and I
> like the idea of Norbert to retry with a legacy encoding. This way, we put
> crooked compatibility layer in exceptional handling.
>
> This will also simplify the MC readers/writers in VW, gst, Gemstone, ...
>
> Even for the legacy code, I wonder if MacRoman would be the right choice. MC
> never encoded the strings and always wrote the codes as is.

Right. I now remember the pain.

> So, setEncoderForCode is here for maintaining compatibility with MC
> snapshot/source.st written from an old image where internal String encoding
> was MacRoman -  when was it, 3.7? Are there really many of these?
>
> I bet 99% of MC-files are encoded in latin-1 but decoded with MacRoman if we
> go through a MczInstaller...
>
> Of course, MC now uses snapshot.bin rather than snapshot/source.st.
> Did old versions of MC failed to write snapshot.bin?
>
> Eventually, we can set a Preferences in Squeak for ultra old legacy encoding
> (not in Pharo, I guess Pharo should not care at all).

For Pharo, I'd guess so, too.

(I heard that the Japanese support is pretty much dropped in Pharo.)

--
-- Yoshiki