String refactoring

Yoshiki Ohshima yoshiki at squeakland.org
Tue Apr 12 06:43:31 UTC 2005


  Andreas,

> >   Oh, well.  Probably we could do the bug compatible behavior for the
> > older projects.  Also, when designing the future systems, avoiding the
> > dependency on any sort of upper/lower case distinction/manipulation.

> Well, no. This has very little to do with upper/lower case
> manipulation. 

  My usage of "Also" might be confusing.  (At least they should've
been in two paragraphs.)

> The problem is that there are currently seven methods in MultiString 
> which are flagged #toBeImplemented but the thing they absolutely need to 
> do (and which they don't) is to be consistent with their latin1 
> interpretations. Even though all of these methods might be broken for 
> some characters or others this is less problematic than having 
> inconsistent implementations where the will results differ if you just 
> happen to choose a different representation for intermediate results, e.g.,
> 
> tst := String with: $a with: (Character value: 512).
> (tst copyFrom: 1 to: 1) capitalized, (tst copyFrom: 2 to: 2)
> 	= tst capitalized
> => false
> 
> For a workaround: This actually seems a little harder to get right than 
> one would imagine (partly because of the above problem with intermediate 
> results). I tried quickly to use hack like #eToysCapitalized which would 
> replicate the broken behavior but there are some really grave issues 
> with this - we're now basically enshrining the broken behavior forever, 
> since noone will ever be able to back out of that usage.

  I was trying to make it so that the strings that are used for names,
etc. in latin1 encodable strings use "String" (now known as
ByteString).  I.e., it *should* be a not too terrible (each time I
wrote this kind of things, the answer from you is usually simple "no"
so I shouldn't do this, but) not to do any conversion for
MultiStrings.

> Changing the code itself in the project also seems to be far from 
> trivial - for one thing we're not getting to the source code of the 
> scripts (this would have allowed us to use simple string replacement), 
> the tiles refer to the already broken name, and we need to get to the 
> slot info in the player to construct the "proper" spelling. But even if 
> we did that I am still not a 100% sure we could do fix all the places 
> where this is used.

  Yes,  this is a no, no.

> This is a really hard problem... I dislike option 1 because it's so 
> fundamentally flawed, and I dislike option 2 because it's so hard to 
> implement. Sigh. I just wished you hadn't removed that code ...

  I might be lost, but which that code?

  ... There is a different level question; at one time, my thought was
(and still is) to make sure that Squeak 3.8 is capable loading
projects from older images.  But honestly, there will be so many other
"improvements" are coming so this will probably not be going to stand
in the images after 3.9.  So the questions is how hard we should try
to keep the compatibility.

-- Yoshiki

  Finally, again, my "when designing the future systems, avoiding the
dependency on any sort of upper/lower case distinction/manipulation."
part.  We discussed this about 18 months ago.  Just I'd like to see
that the "future systems" are free from the character case
distinctions in their languages.




More information about the Squeak-dev mailing list