[squeak-dev] fonts, characterscanners and dead primitive 103

Fri Sep 6 00:56:20 UTC 2013

On Thu, Sep 5, 2013 at 5:21 PM, tim Rowledge <tim at rowledge.org> wrote:
>
> On 05-09-2013, at 4:59 PM, Yoshiki Ohshima <Yoshiki.Ohshima at acm.org> wrote:
>
>>> What is the intent of MultiXXXXX ? What is CombinedChar for? Are they, honestly, still needed? Or should the older versions be removed instead? Who wrote the new classes and is that person still maintaining them? Is he/she still around here?
>>
>> This kind of stuff touches the part of Squeak that *has to* work.
>> Once the "MultiCharacterScanner" worked and people were confident, it
>> was in theory possible to ditch the old implementation; but I did not
>> think back then that it (replacing fundamental code with a
>> "work-in-progress" version) was acceptable to the community.  IF there
>> was enough man-power, there would have been more variation of such
>> scanners implemented for different writing systems; keeping the
>> original version that works for byte strings would have been useful
>> under that light.
>
> So if I understand you correctly, there *should* be no particular differences in what the two types of scanner do? You made a parallel set in order to insulate your work from the tools that you needed to keep working in order to keep making the i18n stuff?

Not quite.  The analogy for WideString and String was like
LargeInteger and SmallInteger, and CharacterScanner was like a
different implementation of #+.  MultiCharacterScanner handles
WideStrings, especially when there are characters with different
leading chars are involved.  So the functionality is different.

> I've worked through several of the scanners without finding any major differences, but not yet all of them. It certainly looks to me that there is nothing to stop us having only one set. I suspect there may be some bug fixes in the more recently created classes, though I did notice at least a couple of places where the method in the old scanner class was actually newer than its equivalent in the new scanner. Do you recall any serious changes made to support multi-byte strings?

The serious change was for handling leading char, and also the
different line breaking rules for different languages.

>> CombinedChar creates a precomposed character from a sequence of
>> decomposed form of Unicode when possible.  For a certain keyboard, it
>> was needed.
>
> Ah, yes now I see . Should CombinedChars ever exist outside that very narrow area of reading the keyboard and then copying out the results to the paragraphs? I didn't see any use beyond that but it can be hard to trace everything.

Whenever you want to find out a sequence is composable, it is
potentially useful.

-- 
-- Yoshiki