[squeak-dev] Proposal | Stop writing ]lang[ leadingChar-runs in chunks (i.e., file-outs)

Marcel Taeumel marcel.taeumel at hpi.de
Sat Jan 29 15:16:39 UTC 2022

Hi all --

We all know that the path to a clean multilingual Squeak based on Unicode is still bumpy. The concept of #leadingChar was an interesting trade-off to efficiently extend pre-rendered StrikeFont's with selected Unicode glyphs. See StrikeFontSet and #DefaultMultiStyle. For example, #installFonts for a JapaneseEnvironment is still functional: http://metatoys.org/pub/FontJapaneseEnvironment.sar

Anyway. #fallbackFont in StrikeFont (and TTCFont) can do (almost) the same trick. Without having to modify the Unicode code points with a #leadingChar. That is, the original intent to circumvent Unicode's Han unification might still be preserved by configuring different fallback fonts and/or text styles per use case (e.g., tool, text field, ...).

As a first step, I propose to stop file-ing out those leadingChar-runs as ]lang[ in the chunk format. Even with the current implementation of leadingChar, it makes no sense to preserve them outside the .image. For example, reading code from a .changes, .sources, .st, or .cs file might preserve the leadingChar's -- however -- working with the respective code fragments (e.g., cut/copy/paste) will immediately remove that "mask on characters" anyway if the system has a different leadingChar (or 0 for Latin1/Unicode). I cannot quite grasp the original intention of preserving those in chunk format. Maybe it was performance.

So ... any more thoughts on this? Maybe a plausible explanation on why ]lang[ exists in the first place given that all converters sending #leadingChar:code: seem to attach that on-the-fly anyway?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220129/823664e8/attachment-0001.html>

More information about the Squeak-dev mailing list