<div dir="ltr"><div>MC never wrote a BOM, so we don't have to be compatible with BOM.<br></div><div><br></div><div>If we can simplify the process, let's simplify, because maintaining useless
compatibility costs, the code is really crooked by now, and this leads
to mis-understanding, and soon to broken features and noise. Currently, snapshot/<a href="http://source.st">source.st</a> IS broken.<br><br>If there are codes > 127, the UTF8TextConverter will most likely fail, and I like the idea of Norbert to retry with a legacy encoding. This way, we put crooked compatibility layer in exceptional handling.<br>
<br><div>This will also simplify the MC readers/writers in VW, gst, Gemstone, ...<br></div><br>Even for the legacy code, I wonder if MacRoman would be the right choice. MC never encoded the strings and always wrote the codes as is.<br>
</div><div><br>So, setEncoderForCode is here for maintaining compatibility with MC snapshot/<a href="http://source.st">source.st</a> written from an old image where internal String encoding was MacRoman - when was it, 3.7? Are there really many of these?<br>
<br>I bet 99% of MC-files are encoded in latin-1 but decoded with MacRoman if we go through a MczInstaller...<br><br></div><div>Of course, MC now uses snapshot.bin rather than snapshot/<a href="http://source.st">source.st</a>.<br>
</div><div>Did old versions of MC failed to write snapshot.bin?<br><br></div><div>Eventually, we can set a Preferences in Squeak for ultra old legacy encoding (not in Pharo, I guess Pharo should not care at all).<br></div>
<div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/5/23 Yoshiki Ohshima <span dir="ltr"><<a href="mailto:Yoshiki.Ohshima@acm.org" target="_blank">Yoshiki.Ohshima@acm.org</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Wed, May 22, 2013 at 2:16 PM, Nicolas Cellier<br>
<<a href="mailto:nicolas.cellier.aka.nice@gmail.com">nicolas.cellier.aka.nice@gmail.com</a>> wrote:<br>
> First thing would be to simplify #setConverterForCode and<br>
> #selectTextConverterForCode.<br>
> Do we still want to use a MacRomanTextConverter, seriously? I'm not even<br>
> sure I've got that many files with that encoding on my Mac-OSX...<br>
> Do we really need to put a ByteOrderMark for UTF-8, seriously? See<br>
> <a href="http://en.wikipedia.org/wiki/Byte_order_mark" target="_blank">http://en.wikipedia.org/wiki/Byte_order_mark</a>, it's valueless, and not<br>
> recommended. It were a Squeak way to specify that a Squeak source file would<br>
> use UTF-8 rather than MacRoman, but now this should be obsolescent.<br>
<br>
</div>Old code was certainly in MacRoman, and quite a few used middle dot,<br>
accented chars and other characters in the right half of the character<br>
chart.<br>
<br>
Monticello surely should use UTF-8. I'd think, though, it should keep<br>
BOM; did you encounter any problems? (it is not recommended, but it<br>
is permitted.)<br>
<br>
--<br>
-- Yoshiki<br>
<br>
</blockquote></div><br></div>