[Vm-dev] [Pharo-dev] Better management of encoding of environment variables

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Jan 18 13:01:02 UTC 2019


Le mer. 16 janv. 2019 à 23:23, Eliot Miranda <eliot.miranda at gmail.com> a
écrit :

>
> Hi Sven,
>
> On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <sven at stfx.eu>
> wrote:
>
>> Still, one of the conclusions of previous discussions about the encoding
>> of environment variables was/is that there is no single correct solution.
>> OS's are not consistent in how the encoding is done in all (historical)
>> contexts (like sometimes, 1 env var defines the encoding to use for others,
>> different applications do different things, and other such nice stuff), and
>> certainly not across platforms.
>>
>> So this is really complex.
>>
>> Do we want to hide this in some obscure VM C code that very few people
>> can see, read, let alone help with ?
>>
>> The image side is perfectly capable of dealing with platform differences
>> in a clean/clear way, and at least we can then use the full power of our
>> language and our tools.
>>
>
> Agreed.  At the same time I think it is very important that we don't reply
> on the FFI for environment variable access.  This is a basic cross-platform
> facility.  So I would like to see the environment accessed through
> primitives, but have the image place interpretation on the result of the
> primitive(s), and have the primitive(s) answer a raw result, just a
> sequence of uninterpreted bytes.
>
> VisualWorks takes this approach and provides a class UninterpretedBytes
> that the VM is aware of.  That's always seemed like an ugly name and
> overkill to me.  I would just use ByteArray and provide image level
> conversion from ByteArray to String, which is what I believe we have anyway.
>
>
What's important is to create abstract layers that insulate the un-needed
complexity in lowest layers possible.
The VM excels at insulating of course.
At image side we have to assume the responsibility of not leaking too much
by ourself.

As Eliot said, right now the VM (and FFI) just take sequences of
uninterpreted bytes (ByteArray) and pass them to API.
The conversion ByteString/WideString <-> specifically-encoded ByteArray is
performed at image side.

With FFI, we could eventually make this conversion platform specific
instead of always UTF8.
The purpose would be to reduce back and forth conversions in chained API
calls for example.
For sanity, then better follow those rules:
- the image does not attempt direct interaction with these opaque data
(other than thru OS API)
- nor preserve them across snapshots.
Beware, conversion is not platform specific, but can be library specific
(some library on windows will take UTF8).
So we may reify the library and always double dispatch to the library, or
we create upper level abstract messages that may chain several low level OS
API calls.
We would thus let complexity creep one more level, but only if we have good
reason to do so.
We don't want to trade uniformity for small gains.
BTW, note that the xxxW API is already a huge uniformisation progress
compared to the code-page specific xxxA API!

Another strategy is to create more complex abstractions (i.e.
parameterized) that can deal with a zoo of different underlying conventions.
For example, this would be the EncodedString of VW.
This strategy could be tempting, because it enables dealing with lower
level platform-specific-encoded objects and still interact with them in the
image transparently.
But I strongly advise to think twice (or more) before introducing such
complexity:
- it breaks former invariants (thus potentially lot of code)
- complexity tends to spread in many places
I don't recommend it.

PS: oups, sorry for out of band message, I wanted to send, but it seems
that I did not press the button properly...

>
>> > On 16 Jan 2019, at 10:59, Guillermo Polito <guillermopolito at gmail.com>
>> wrote:
>> >
>> > Hi Nicolas,
>> >
>> > On Wed, Jan 16, 2019 at 10:25 AM Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> wrote:
>> > IMO, windows VM (and plugins) should do the UCS2 -> UTF8 conversion
>> because the purpose of a VM is to provide an OS independant façade.
>> > I made progress recently in this area, but we should finish the
>> job/test/consolidate.
>> >
>> > I'm following your changes for windows from the shadows and I think
>> they are awesome :).
>> >
>> > If someone bypass the VM and use direct windows API thru FFI, then he
>> takes the responsibility, but uniformity doesn't hurt.
>> >
>> >  So far we are using FFI for this, as you say we create first
>> Win32WideStrings from utf8 strings and then we use ffi calls to the *W
>> functions.
>> > I don't think we can make it for Pharo7.0.0. The cycle to build, do
>> some acceptance tests, and then bless a new VM as stable is far too long
>> for our inminent release :).
>> >
>> > But this could be for a 7.1.0, and if you like I can surely give a hand
>> on this.
>> >
>> > Guille
>>
>>
>>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190118/12975708/attachment.html>


More information about the Vm-dev mailing list