<div dir="ltr"><div>A Character codePoint contains both<br>- a charCode<br>- a language tag (so called #leadingChar)<br><br></div><div>The leadingChar can encode either a CharacterSet, or a LanguageEnvironment (see EncodedCharSet initialize).<br>
</div><div>The CharacterSet tells how to interpret the charCode (whether 16r41 encodes a capital A or something else).<br></div><div><br>All this is complex, and has strange side effects, because a letter A in a given char set could be different from a character A in another char set (they don't have same leadingChar, and eventually not same charCode, though maybe not true for A since most encodings are superset of ASCII)...<br>
With Unicode (iso 10646) we can have a canonical (hem, almost) encoding for all languages, so all this is getting a bit obsolete, except for eastern asian languages for historical reasons.<br><br>I've tried to generalize the use of Unicode in the image, except for eastern Asian environments.<br>
<br>The latin1 character set is a subset of Unicode (it matches the first 256 codes), so with the promotion of Unicode, it is effectively obsolescent.<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
2013/9/26 tim Rowledge <span dir="ltr"><<a href="mailto:tim@rowledge.org" target="_blank">tim@rowledge.org</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
On 26-09-2013, at 10:19 AM, tim Rowledge <<a href="mailto:tim@rowledge.org">tim@rowledge.org</a>> wrote:<br>
<br>
><br>
> On 26-09-2013, at 7:14 AM, Bob Arning <<a href="mailto:arning315@comcast.net">arning315@comcast.net</a>> wrote:<br>
><br>
>> Well, something is a little wrong<br>
><br>
> I rather thought so. I'll use your StringHolder to work out something. Actually I reckon a quick hack to add #space and simply use #registerBreakableIndex should be good start.<br>
<br>
<br>
</div>Well, that wasn't much fun.<br>
<br>
The current implementations of registerBreakableIndex and crossedX are nastily intertwined with assumptions about how they are used in such a way that I suspect laws of nature are being broken. Certainly I'm not going to spend any more time today trying to work out WTF is going on.<br>
<br>
So I've returned the use of isBreakableAt:in:in: & registerBreakableIndex to their previous status and it no longer makes nasty with widestrings and wrapping.<br>
<br>
It raises more questions (still lots from previous message unanswered folks!)-<br>
EncodeCharSets - there are several commented out in EncodeCharSet class>initialise Why?<br>
Why is Unicode also commented as 'Latin1Environment'?<br>
What is Latin2Environment?<br>
Why is there a separate Latin1 class?<br>
Why are there mixed up encodedcharset classes and language environment classes?<br>
<div class="im"><br>
<br>
tim<br>
--<br>
tim Rowledge; <a href="mailto:tim@rowledge.org">tim@rowledge.org</a>; <a href="http://www.rowledge.org/tim" target="_blank">http://www.rowledge.org/tim</a><br>
</div>Oxymorons: Clearly misunderstood<br>
<br>
<br>
<br>
</blockquote></div><br></div>