<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">2014-05-28 4:50 GMT+02:00 Yoshiki Ohshima <span dir="ltr"><<a href="mailto:Yoshiki.Ohshima@acm.org" target="_blank">Yoshiki.Ohshima@acm.org</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">At Tue, 27 May 2014 19:23:09 -0700,<br>
<div class="">Andres Valloud wrote:<br>
><br>
> String encoding is perpendicular to my point. I'm referring to<br>
> canonical equivalence as defined in section 1.1 of the document<br>
> referenced by the URL I sent. For instance, the Hangul example in the<br>
> first table shows that a combination of two characters (regardless of<br>
> encoding) is to be considered canonically equivalent to a single<br>
> character. From the document (which claims to be Unicode Standard Annex<br>
> #15),<br>
><br>
> "Canonical equivalence is a fundamental equivalency between characters<br>
> or sequences of characters that represent the same abstract character,<br>
> and when correctly displayed should always have the same visual<br>
> appearance and behavior."<br>
><br>
> How do you propose that a size check is appropriate in the presence of<br>
> canonical equivalence? What is string equivalence supposed to mean? I<br>
> think more attention should be given to those questions.<br>
<br>
</div>I think that the single equal message (=) in the Smalltalk language<br>
should not really worry about canonical equvalence. For those who<br>
need it, it'd be fine to define a new selector and does the real<br>
stuff, and such method could track the Unicode standard revisions and<br>
do the right thing. But something as fundamental as String>>#= does<br>
not have to have dependency to the external standard.<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Yoshiki<br>
<br>
</font></span></blockquote></div><br>If internal representation is not canonical, we are going toward a path of maximum complexity.<br></div><div class="gmail_extra">All comparison functions = < > <= >= hash will have to first canonicalize.<br>
</div><div class="gmail_extra">So i tend to agree with Yoshiki, let these kernel methods perform their dumb task, and reject this complexity outside.<br><br></div><div class="gmail_extra">Well beyond the complexity of Unicode, the cr-lf mess already creates the same problem.<br>
There is no semantic difference between cr and cr-lf.<br>Though I had to insert a few withSqueakLineEndings sends in Monticello when playing with GitFileTree.<br></div></div>