String refactoring
Andreas Raab
andreas.raab at gmx.de
Tue Apr 12 07:08:56 UTC 2005
Hi -
> I was trying to make it so that the strings that are used for names,
> etc. in latin1 encodable strings use "String" (now known as
> ByteString). I.e., it *should* be a not too terrible (each time I
> wrote this kind of things, the answer from you is usually simple "no"
> so I shouldn't do this, but) not to do any conversion for
> MultiStrings.
I'm not sure I understand what you mean here. It's fine that latin1
encodable strings are ByteStrings but as your example shows we can have
strings that mix both latin1 and non-latin1 characters and if
MultiString is supposed to work for latin1 too (which I think we all
agree on) then it ought to be possible to capitalize a MultiString, no?
Put differently, the reason why I am opposed to *not* capitalizing
MultiStrings correctly is that besides cases such as we've seen here
already (e.g., mix of latin1 and non-latin1) you would also take this
ability away from many other languages not encoded in the latin1 range.
Which (for example) includes cyrillic. Now, why exactly would we deny a
russian the ability to capitalize a string? ;-)
>>This is a really hard problem... I dislike option 1 because it's so
>>fundamentally flawed, and I dislike option 2 because it's so hard to
>>implement. Sigh. I just wished you hadn't removed that code ...
>
> I might be lost, but which that code?
Oh, I'm just referring to the original String code. If this would have said:
capitalized
self flag: #toBeImplemented.
^super capitalized. "for now..."
we wouldn't have the inconsistency problem in the projects out there.
> ... There is a different level question; at one time, my thought was
> (and still is) to make sure that Squeak 3.8 is capable loading
> projects from older images. But honestly, there will be so many other
> "improvements" are coming so this will probably not be going to stand
> in the images after 3.9. So the questions is how hard we should try
> to keep the compatibility.
Yes we will support those. I have pondered this question for quite a
while and I think we can support old projects as long as necessary. Mind
you - I just took apart the whole String hierarchy, renamed half of it,
indeed changed the *meaning* of both String and Symbol and (besides the
inconsistency problems) everything just works. I am committed to
supporting projects for as long as we need.
> Finally, again, my "when designing the future systems, avoiding the
> dependency on any sort of upper/lower case distinction/manipulation."
> part. We discussed this about 18 months ago. Just I'd like to see
> that the "future systems" are free from the character case
> distinctions in their languages.
I think I remember what you refer to and as far as it goes I will say
that I agree that using upper/lower case for semantic distinctions (such
as whether a character is allowed as the first character of a global
name) is a not a good idea. That said, we (that is "we western guys")
understand fairly little about eventual cultural constraints that other
languages may make.
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|