<div dir="ltr"><div>A Character codePoint contains both<br>- a charCode<br>- a language tag (so called #leadingChar)<br><br></div><div>The leadingChar can encode either a CharacterSet, or a LanguageEnvironment (see EncodedCharSet initialize).<br>

</div><div>The CharacterSet tells how to interpret the charCode (whether 16r41 encodes a capital A or something else).<br></div><div><br>All this is complex, and has strange side effects, because a letter A in a given char set could be different from a character A in another char set (they don&#39;t have same leadingChar, and eventually not same charCode, though maybe not true for A since most encodings are superset of ASCII)...<br>

With Unicode (iso 10646) we can have a canonical (hem, almost) encoding for all languages, so all this is getting a bit obsolete, except for eastern asian languages for historical reasons.<br><br>I&#39;ve tried to generalize the use of Unicode in the image, except for eastern Asian environments.<br>

<br>The latin1 character set is a subset of Unicode (it matches the first 256 codes), so with the promotion of Unicode, it is effectively obsolescent.<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

2013/9/26 tim Rowledge <span dir="ltr">&lt;<a href="mailto:tim@rowledge.org" target="_blank">tim@rowledge.org</a>&gt;</span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><br>

On 26-09-2013, at 10:19 AM, tim Rowledge &lt;<a href="mailto:tim@rowledge.org">tim@rowledge.org</a>&gt; wrote:<br>

<br>

&gt;<br>

&gt; On 26-09-2013, at 7:14 AM, Bob Arning &lt;<a href="mailto:arning315@comcast.net">arning315@comcast.net</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; Well, something is a little wrong<br>

&gt;<br>

&gt; I rather thought so. I&#39;ll use your StringHolder to work out something. Actually I reckon a quick hack to add #space and simply use #registerBreakableIndex should be good start.<br>

<br>

<br>

</div>Well, that wasn&#39;t much fun.<br>

<br>

The current implementations of registerBreakableIndex and crossedX are nastily intertwined with assumptions about how they are used in such a way that I suspect laws of nature are being broken. Certainly I&#39;m not going to spend any more time today trying to work out WTF is going on.<br>


<br>

So I&#39;ve returned the use of isBreakableAt:in:in: &amp; registerBreakableIndex to their previous status and it no longer makes nasty with widestrings and wrapping.<br>

<br>

It raises more questions (still lots from previous message unanswered folks!)-<br>

EncodeCharSets - there are several commented out in EncodeCharSet class&gt;initialise Why?<br>

Why is Unicode also commented as &#39;Latin1Environment&#39;?<br>

What is Latin2Environment?<br>

Why is there a separate Latin1 class?<br>

Why are there mixed up encodedcharset classes and language environment classes?<br>

<div class="im"><br>

<br>

tim<br>

--<br>

tim Rowledge; <a href="mailto:tim@rowledge.org">tim@rowledge.org</a>; <a href="http://www.rowledge.org/tim" target="_blank">http://www.rowledge.org/tim</a><br>

</div>Oxymorons: Clearly misunderstood<br>

<br>

<br>

<br>

</blockquote></div><br></div>