<div dir="ltr">Hi Todd,<div class="gmail_extra"><br><div class="gmail_quote">On Tue, Dec 15, 2015 at 3:46 PM, Todd Blanchard <span dir="ltr"><<a href="mailto:tblanchard@mac.com" target="_blank">tblanchard@mac.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Eliot,<span class=""><div><br><div><blockquote type="cite"><div>On Dec 15, 2015, at 13:46, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com" target="_blank">eliot.miranda@gmail.com</a>> wrote:</div><br><div><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Just so you know, I will dig my heels in as deeply as I am able to prevent the use of C++ libraries in the VM. It destroys the simulator, which is the most important thing we have for VM development productivity. As far as I'm concerned any use of external libraries to implement core functionality kills the VM-in-Smalltalk concept that Squeak (and Pharo) are built upon.</span></div></blockquote></div><br></div></span><div>OK, I defer to you because you certainly know more about the VM internals and what does and doesn't work well than anyone else. </div><div><br></div><div>So I guess I would like to know your recommendation for 1) how best to store strings - byte arrays (UTF8), - 2-byte word arrays (UTF16 - now we get to worry about endian). </div></div></blockquote><div><br></div><div>Raw Unicode, either as 8-bit, 16-bit or 32-bit. When creating a String it should start as an 8-bit-per-Unicode-character string. Attempts to store Character values that won't fit cause the String to become a String whose element size is large enough to accommodate the character. In Spur, become: is cheap so this growth pays only for the reallocation and copying of the at a, not for an expensive heap scan necessary to do the become:.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>Bearing in mind that both representations are variable length and so while accessing the n'th byte/word is O(1), accessing the n'th character is necessarily O(n) unless you know you have no surrogates in your string.</div></div></blockquote><div><br></div><div>Right, so UTF-8 and UTF-16 are not convenient representations and to be provided only for interchange.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>Also...since NSString has been mentioned...it is worth noting that NSString is built atop CFString (source code here: <a href="https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c" target="_blank">https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c</a>) which does a fair job of optimizing memory by using bytes where it can and shorts where it cannot. It is also worth noting that characterAt: actually does the wrong thing, since it assumes characters are no bigger than FFFF rather than 10FFFF. </div></div></blockquote><div><br></div><div>Yes, and Squeak (and AFAIA, Pharo) has been doing this for ages. If one has become: it is very easy to manage. Now with Spur not only do we have become:, we have a fairly fast become:.</div><div><br></div><div>Does this make sense?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Also...I'll just toss in this very nice article on unicode and how NSString deals with it.<br></div><div><a href="https://www.objc.io/issues/9-strings/unicode/" target="_blank">https://www.objc.io/issues/9-strings/unicode/</a></div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-Todd Blanchard</div></font></span></div></blockquote><div> </div></div><div class="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>
</div></div>