[squeak-dev] The Trunk: Collections-topa.806.mcz

Thu Sep 13 22:52:21 UTC 2018

>> The question should, IMO at least, be "what character set should Squeak use" and, again IMO, that should be Unicode and, in particular, the UTF-8 encoding. (http://utf8everywhere.org/)

We should probably have a proper UTF8String class so that at least we know that it is encoded and needs conversion to a 'real' String. During the NuScratch work I toiled mightily with string stuff and really ought to have done it then. The current widestring/bytestring stuff works quite well though for most internal cases, though the cost of converting an entire string anytime a big char is inserted could get annoying.

If one were making a word processor for large amounts of text, rather than a text editor with some prettiness tweaks for code editting etc, it might pay to have a form of text that allows for mixed byte & wide sub-parts. Perhaps even possible to use text attributes in yet another twisted and sneaky way? As we discovered in the Sophie Project, handling formatted texts is decidedly non-trivial. Especially when the customer can't even define a paragraph for you....

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Strange OpCodes: PSM: Print and SMear