[squeak-dev] fonts, characterscanners and dead primitive 103

Fri Sep 6 00:21:14 UTC 2013

On 05-09-2013, at 4:59 PM, Yoshiki Ohshima <Yoshiki.Ohshima at acm.org> wrote:

> On Wed, Sep 4, 2013 at 8:24 PM, tim Rowledge <tim at rowledge.org> wrote:
>> After simplifying the scanning code a bit I'm looking into why we have the seemingly insane situation of two parallel hierarchies of CharacterScanner. So far it looks like there are no really substantive differences between CharacterScanner and MultiCharacterScanner and their subclasses. This seems like a mistake somewhere; certainly it could be mine, missing something important.
> 
> It's all my fault and incompetence.  I am sorry.

Well, it might be your 'fault' but I rather doubt it was incompetence…

> 
>> What is the intent of MultiXXXXX ? What is CombinedChar for? Are they, honestly, still needed? Or should the older versions be removed instead? Who wrote the new classes and is that person still maintaining them? Is he/she still around here?
> 
> This kind of stuff touches the part of Squeak that *has to* work.
> Once the "MultiCharacterScanner" worked and people were confident, it
> was in theory possible to ditch the old implementation; but I did not
> think back then that it (replacing fundamental code with a
> "work-in-progress" version) was acceptable to the community.  IF there
> was enough man-power, there would have been more variation of such
> scanners implemented for different writing systems; keeping the
> original version that works for byte strings would have been useful
> under that light.

So if I understand you correctly, there *should* be no particular differences in what the two types of scanner do? You made a parallel set in order to insulate your work from the tools that you needed to keep working in order to keep making the i18n stuff?

I've worked through several of the scanners without finding any major differences, but not yet all of them. It certainly looks to me that there is nothing to stop us having only one set. I suspect there may be some bug fixes in the more recently created classes, though I did notice at least a couple of places where the method in the old scanner class was actually newer than its equivalent in the new scanner. Do you recall any serious changes made to support multi-byte strings?

> 
> CombinedChar creates a precomposed character from a sequence of
> decomposed form of Unicode when possible.  For a certain keyboard, it
> was needed.

Ah, yes now I see . Should CombinedChars ever exist outside that very narrow area of reading the keyboard and then copying out the results to the paragraphs? I didn't see any use beyond that but it can be hard to trace everything.

If it's actually possible to simplify and get rid of a duplication of classes it would be nice to clean up!

Right now I'm thinking about refactoring to allow the class of the string and the font to be used instead of explicit tests for widestring and font-does-kerning etc. It seems to me that modern font systems are much more 'active' than we used to think of StrikeFonts being and maybe it is time fonts did their own scanning. That way it could be via simple methods, a prim or even a call out to a library. I'm aiming to make sure that the simple cases work really fast on slow machines (can we say Raspberry Pi?) and the complex cases at least work decently.

tim
--
tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
Strange OpCodes: RDR: Rotate Disk Right