String refactoring

Tue Apr 12 07:08:56 UTC 2005

Hi -

>   I was trying to make it so that the strings that are used for names,
> etc. in latin1 encodable strings use "String" (now known as
> ByteString).  I.e., it *should* be a not too terrible (each time I
> wrote this kind of things, the answer from you is usually simple "no"
> so I shouldn't do this, but) not to do any conversion for
> MultiStrings.

I'm not sure I understand what you mean here. It's fine that latin1 
encodable strings are ByteStrings but as your example shows we can have 
strings that mix both latin1 and non-latin1 characters and if 
MultiString is supposed to work for latin1 too (which I think we all 
agree on) then it ought to be possible to capitalize a MultiString, no?

Put differently, the reason why I am opposed to *not* capitalizing 
MultiStrings correctly is that besides cases such as we've seen here 
already (e.g., mix of latin1 and non-latin1) you would also take this 
ability away from many other languages not encoded in the latin1 range.

Which (for example) includes cyrillic. Now, why exactly would we deny a 
russian the ability to capitalize a string? ;-)

>>This is a really hard problem... I dislike option 1 because it's so 
>>fundamentally flawed, and I dislike option 2 because it's so hard to 
>>implement. Sigh. I just wished you hadn't removed that code ...
> 
>   I might be lost, but which that code?

Oh, I'm just referring to the original String code. If this would have said:

capitalized
	self flag: #toBeImplemented.
	^super capitalized. "for now..."

we wouldn't have the inconsistency problem in the projects out there.

>   ... There is a different level question; at one time, my thought was
> (and still is) to make sure that Squeak 3.8 is capable loading
> projects from older images.  But honestly, there will be so many other
> "improvements" are coming so this will probably not be going to stand
> in the images after 3.9.  So the questions is how hard we should try
> to keep the compatibility.

Yes we will support those. I have pondered this question for quite a 
while and I think we can support old projects as long as necessary. Mind 
you - I just took apart the whole String hierarchy, renamed half of it, 
indeed changed the *meaning* of both String and Symbol and (besides the 
inconsistency problems) everything just works. I am committed to 
supporting projects for as long as we need.

>   Finally, again, my "when designing the future systems, avoiding the
> dependency on any sort of upper/lower case distinction/manipulation."
> part.  We discussed this about 18 months ago.  Just I'd like to see
> that the "future systems" are free from the character case
> distinctions in their languages.

I think I remember what you refer to and as far as it goes I will say 
that I agree that using upper/lower case for semantic distinctions (such 
as whether a character is allowed as the first character of a global 
name) is a not a good idea. That said, we (that is "we western guys") 
understand fairly little about eventual cultural constraints that other 
languages may make.

Cheers,
   - Andreas