String refactoring

Yoshiki Ohshima yoshiki at squeakland.org
Tue Apr 12 07:47:06 UTC 2005


  Andreas,

> Put differently, the reason why I am opposed to *not* capitalizing 
> MultiStrings correctly is that besides cases such as we've seen here 
> already (e.g., mix of latin1 and non-latin1) you would also take this 
> ability away from many other languages not encoded in the latin1 range.
> 
> Which (for example) includes cyrillic. Now, why exactly would we deny a 
> russian the ability to capitalize a string? ;-)

  Ah, not denying, but we haven't have the privilage to have Russian
eToys projects floating around on the net yet.  For the future
releases, #capitalize should try to capitalize a character if the
Unicode Consortium thinks the character is capitalizable.

  A more dirty way to keep the backward compatibility is that
#capitalize changes its behavior when the primary language setting is
Japanese or Korean and only when the image is dealing with the
projects from the older images.

> Oh, I'm just referring to the original String code. If this would have said:
> 
> capitalized
> 	self flag: #toBeImplemented.
> 	^super capitalized. "for now..."
> 
> we wouldn't have the inconsistency problem in the projects out there.

  Yeah.

> >   ... There is a different level question; at one time, my thought was
> > (and still is) to make sure that Squeak 3.8 is capable loading
> > projects from older images.  But honestly, there will be so many other
> > "improvements" are coming so this will probably not be going to stand
> > in the images after 3.9.  So the questions is how hard we should try
> > to keep the compatibility.
> 
> Yes we will support those. I have pondered this question for quite a 
> while and I think we can support old projects as long as necessary. Mind 
> you - I just took apart the whole String hierarchy, renamed half of it, 
> indeed changed the *meaning* of both String and Symbol and (besides the 
> inconsistency problems) everything just works. I am committed to 
> supporting projects for as long as we need.

  Ok.

> >   Finally, again, my "when designing the future systems, avoiding the
> > dependency on any sort of upper/lower case distinction/manipulation."
> > part.  We discussed this about 18 months ago.  Just I'd like to see
> > that the "future systems" are free from the character case
> > distinctions in their languages.
> 
> I think I remember what you refer to and as far as it goes I will say 
> that I agree that using upper/lower case for semantic distinctions (such 
> as whether a character is allowed as the first character of a global 
> name) is a not a good idea. That said, we (that is "we western guys") 
> understand fairly little about eventual cultural constraints that other 
> languages may make.

  It is not really cultural constraint.  Take Java as an example.
Class names can begin with any "letter" (close to Unicode definition +
some chars like "_"), instance variable names (for, err, instance) can
begin with any "letter" (close to Unicode definition + some chars like
"_"), etc., etc.  Just we shouldn't assume "letters" don't necessarily
have the counterpart.  German has "eszett" so it isn't that alian...

  In Squeak, one could make an instance variable that begins with
eszett, but not a class name or other globals.  This kind of
restriction will be too strict in general in other languages.

-- Yoshiki



More information about the Squeak-dev mailing list