[squeak-dev] Unicode

Jakob Reschke jakres+squeak at gmail.com
Sun May 8 10:55:36 UTC 2022


I want to retract the batteries not included argument, because I conflated
the timezones situation too much with the Unicode situation. For the
Unicode, there is already something in the image, although it is outdated.

Am So., 8. Mai 2022 um 12:23 Uhr schrieb Jakob Reschke <
jakres+squeak at gmail.com>:

>
> Also it is kind of inconsistent: on the one hand Squeak does not want to
> rely on foreign libraries (via FFI) by default, and have everything in
> Smalltalk or within the image, on the other hand it loads basic databases
> from the outside... Batteries not included.
>
>



>
> Thiede, Christoph <Christoph.Thiede at student.hpi.uni-potsdam.de> schrieb
> am Do., 5. Mai 2022, 19:24:
>
>> Hi Jakob, Hi Marcel,
>>
>>
>> the advantage of your proposed solution is that we would have more
>> control over the process.
>>
>> The disadvantage is that it would increase the package size and tangle
>> logic + data together. At least, we're talking about ~300 kB for the
>> Unicode data if I used SpaceTally correctly. :-)
>>
>>
>> Personally, I would prefer to stay with the existing practice because
>> updating your Unicode data locally really seems to be optional at the
>> moment.
>>
>>
>> Best,
>>
>> Christoph
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Taeumel, Marcel
>> *Gesendet:* Freitag, 8. April 2022 09:56:36
>> *An:* squeak-dev
>> *Betreff:* Re: [squeak-dev] Unicode
>>
>> Hi Jakob --
>>
>> > One could run the download&generate step as needed to update the data.
>> (CI, release bubild, manually)
>>
>> To integrate it as part of ReleaseBuilder class >> #prepareEnvironment
>> would be my preferred way. Then it would be part of the CI.
>>
>> Best,
>> Marcel
>>
>> Am 05.04.2022 14:41:14 schrieb Jakob Reschke <jakres+squeak at gmail.com>:
>> Would it be possible/practical to separate this?
>> * Download and transform into code/objects
>> * Distribute the generated code/objects via the update stream directly
>> One could run the download&generate step as needed to update the data.
>> (CI, release build, manually)
>> Or are there any reasons not to do that?
>>
>> I was asking myself the same thing recently about the package that
>> provides time zone information. It needs a Unix timezone database from the
>> operating system to initialize, rather than providing Smalltalk
>> objects/code directly in Monticello, based on the official online database.
>>
>> Kind regards,
>> Jakob
>>
>> Am Di., 5. Apr. 2022 um 08:59 Uhr schrieb Marcel Taeumel <
>> marcel.taeumel at hpi.de>:
>>
>>> Hi Eliot, hi Christoph --
>>>
>>> > Unicode reinitializeData.
>>>
>>> I think that this method has an unfortunate name. Since it downloads
>>> data from the Internet, it should be called #downloadAndInitializeData.
>>>
>>> And that's the reason for it not being in the post-load script. We might
>>> put the raw info there, but it would be very surprising if "Update Squeak"
>>> fetches data other than from source.squeak.org.
>>>
>>> Best,
>>> Marcel
>>>
>>> Am 04.04.2022 21:17:17 schrieb Thiede, Christoph <
>>> christoph.thiede at student.hpi.uni-potsdam.de>:
>>>
>>> > If this is essential
>>>
>>>
>>> Well, how do you define essential? You can still use your image with the
>>> old font definitions. However, for some newer codepoints such as 😁😎😍, Unicode
>>> generalCategoryLabelOf: and friends will answers "not assigned" without the
>>> upgrade. You can watch the difference by browsing any comprehensive font in
>>> the FontImporter. But I am not aware of any code path that relies on the
>>> presence of newer Unicode data.
>>>
>>>
>>> Apart from that, I was already discussing with Marcel what would be the
>>> consequences of downloading data from a third-party server during an image
>>> update. There might be any images, most likely server images, that do not
>>> have free internet access due to a strict firewall. Hypothetically, this
>>> might even introduce any security issues. So in the end, we decided on
>>> leaving this optional for now. It will only break if any future patch of
>>> any package relies on exact Unicode data.
>>>
>>>
>>> Best,
>>>
>>> Christoph
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>> *Gesendet:* Montag, 4. April 2022 20:54:59
>>> *An:* The general-purpose Squeak developers list
>>> *Betreff:* Re: [squeak-dev] Unicode
>>>
>>>
>>>
>>> On Apr 4, 2022, at 11:20 AM, Christoph.Thiede at student.hpi.uni-potsdam.de
>>> wrote:
>>>
>>> Merged via Multilingual-ct.271, Multilingual-ct.272,
>>> MultilingualTests-ct.41, and ReleaseBuilder-ct.231.
>>>
>>> Please run the following in your image to install the new Unicode data
>>> (and to uncover any regressions I may have missed :D):
>>>
>>>     Unicode reinitializeData.
>>>
>>>
>>> If this is essential then it *must* be added as a post load script to
>>> one (or more) of the relevant packages.  Asking “did you run Unicode reinitializeData?”
>>> when someone reports a strange bug isn’t acceptable.
>>>
>>>
>>> Best,
>>> Christoph
>>>
>>> ---
>>> *Sent from **Squeak Inbox Talk
>>> <https://github.com/hpi-swa-lab/squeak-inbox-talk>*
>>>
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220508/c57e3240/attachment.html>


More information about the Squeak-dev mailing list