<div dir="ltr"><div>MC never wrote a BOM, so we don&#39;t have to be compatible with BOM.<br></div><div><br></div><div>If we can simplify the process, let&#39;s simplify, because maintaining useless 

compatibility costs, the code is really crooked by now, and this leads 

to mis-understanding, and soon to broken features and noise. Currently, snapshot/<a href="http://source.st">source.st</a> IS broken.<br><br>If there are codes &gt; 127, the UTF8TextConverter will most likely fail, and I like the idea of Norbert to retry with a legacy encoding. This way, we put crooked compatibility layer in exceptional handling.<br>

<br><div>This will also simplify the MC readers/writers in VW, gst, Gemstone, ...<br></div><br>Even for the legacy code, I wonder if MacRoman would be the right choice. MC never encoded the strings and always wrote the codes as is.<br>

</div><div><br>So, setEncoderForCode is here for maintaining compatibility with MC snapshot/<a href="http://source.st">source.st</a> written from an old image where internal String encoding was MacRoman -  when was it, 3.7? Are there really many of these?<br>

<br>I bet 99% of MC-files are encoded in latin-1 but decoded with MacRoman if we go through a MczInstaller...<br><br></div><div>Of course, MC now uses snapshot.bin rather than snapshot/<a href="http://source.st">source.st</a>.<br>

</div><div>Did old versions of MC failed to write snapshot.bin?<br><br></div><div>Eventually, we can set a Preferences in Squeak for ultra old legacy encoding (not in Pharo, I guess Pharo should not care at all).<br></div>

<div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/5/23 Yoshiki Ohshima <span dir="ltr">&lt;<a href="mailto:Yoshiki.Ohshima@acm.org" target="_blank">Yoshiki.Ohshima@acm.org</a>&gt;</span><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Wed, May 22, 2013 at 2:16 PM, Nicolas Cellier<br>

&lt;<a href="mailto:nicolas.cellier.aka.nice@gmail.com">nicolas.cellier.aka.nice@gmail.com</a>&gt; wrote:<br>

&gt; First thing would be to simplify #setConverterForCode and<br>

&gt; #selectTextConverterForCode.<br>

&gt; Do we still want to use a MacRomanTextConverter, seriously? I&#39;m not even<br>

&gt; sure I&#39;ve got that many files with that encoding on my Mac-OSX...<br>

&gt; Do we really need to put a ByteOrderMark for UTF-8, seriously? See<br>

&gt; <a href="http://en.wikipedia.org/wiki/Byte_order_mark" target="_blank">http://en.wikipedia.org/wiki/Byte_order_mark</a>, it&#39;s valueless, and not<br>

&gt; recommended. It were a Squeak way to specify that a Squeak source file would<br>

&gt; use UTF-8 rather than MacRoman, but now this should be obsolescent.<br>

<br>

</div>Old code was certainly in MacRoman, and quite a few used middle dot,<br>

accented chars and other characters in the right half of the character<br>

chart.<br>

<br>

Monticello surely should use UTF-8.  I&#39;d think, though, it should keep<br>

BOM; did you encounter any problems?  (it is not recommended, but it<br>

is permitted.)<br>

<br>

--<br>

-- Yoshiki<br>

<br>

</blockquote></div><br></div>