[Vm-dev] Windos UNICODE (?!?!?)

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat May 9 08:28:56 UTC 2020


Le ven. 8 mai 2020 à 09:55, Tobias Pape <Das.Linux at gmx.de> a écrit :

>
>
> > On 08.05.2020, at 09:52, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
> >
> > +1
> >
> > I think we already have done most of the work to enable this...
> minheadless is UNICODE already.
>
> I remember breaking my head when trying in my windows branches.
> at least the ssl stuff should work ;)
>
> -t
>
> and all the hickups around July 2016 shows that converting legacy code is
paved with traps:
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/6437fb3788a373d9ed733b9c2e23c500fed98505
This was good work, sorry for making you revert that, but it had to be
finished/polished...

If we go to UNICODE, we must check:
- if the primitives or plugin are implicitely using the W variant
- how the string arguments are interpreted (converted from some encoding to
UTF16? passed directly as if UTF16?)
- what the image is passing to the primitve (this must be done in
Squeak/Cuis and maybe Pharo)

The strategy we have chosen so far is to
- encode most strings passed to the underlying OS as UTF8
- let the primitives convert to UTF16 on win32
- let the primitive explicitely invoke the W variant

Similarly, strings returned are converted back to UTF8.
This is making like more simple at image side, because platform independent
(in the spirit of a VM).

We could as well have chosen a platform specific format for passing strings
(UTF16 would be the choice on win32), and let image delegate the encoding
to OSPlatform current, or something like that.

>From VM perspective, we could offer a set of functions for interpreting
input Smallalk object as:
- ByteString = iso8859L1 to fit the image
- ByteArray = utf8 encoded bytes
- DoubleByteArray = utf16 encoded bytes
- WordArray (including WideString) = ucs-4
and converting to various platform encodings...
Most of these functions would use native capabilities of OS, so it's not a
lot of work.

This versatility would delegate the decision to trade the uniformity (utf8)
for efficiency at image side.
Thoughts?


>
> > Le ven. 8 mai 2020 à 09:23, Tobias Pape <Das.Linux at gmx.de> a écrit :
> >
> > Hi
> >
> > I think we ought to define UNICODE and _UNICODE.
> > -t
> >
> > > On 08.05.2020, at 09:14, Marcel Taeumel <marcel.taeumel at hpi.de> wrote:
> > >
> > > ...on second thought: I might be better to first find all non-unicode
> calls, convert them, make sure everything works, then enable that UNICODE
> flag, and finally change all explicit unicode calls to "transparent" ones.
> :-)
> > >> Am 08.05.2020 09:03:40 schrieb Marcel Taeumel <marcel.taeumel at hpi.de
> >:
> > >>
> > >> Hi Eliot,
> > >>
> > >> I suppose we decided to make explicit use of the unicode functions.
> See https://devblogs.microsoft.com/oldnewthing/?p=40643 -- Maybe this is
> from a time when our source held both ANSI and UNICODE variants. These
> days, we could simplify the code by defining -DUNICODE=1 and skip using,
> e.g., "GetWindowTextW" because then "GetWindowText" will automatically
> choose the unicode versions.
> > >>
> > >> I think that that -DUNICODE has never been part of some makefile.
> > >>
> > >> Best,
> > >> Marcel
> > >>> Am 07.05.2020 23:09:23 schrieb Eliot Miranda <
> eliot.miranda at gmail.com>:
> > >>>
> > >>> Hi All, especially Windows persons,
> > >>>
> > >>> I'm debugging the 64-bit VM in the context of Perf on win64. I've
> > >>> just noticed that we don't appear to be building a UNICODE VM?!?! I
> rind
> > >>> no define of UNICODE in the win32 platform files and no define of
> UNICODE
> > >>> on the compiler command line in the build.winXXxYY/common makefiles.
> This
> > >>> is surely a mistake, isn't it? If this is a regression, when and why
> did
> > >>> this occur? I'm going to add -DUNICODE=1 to the Makefile.msvc
> makefiles.
> > >>> AFAICT this should also happen with the Cygwin Makefiles. But before
> I
> > >>> make changes that could affect lots of people I thought I should ask
> here.
> > >>> Anyone know when and why we dropped -DUNICODE from the Cygwin build
> command
> > >>> line?
> > >>>
> > >>> _,,,^..^,,,_
> > >>> best, Eliot
> > >>> Hi All, especially Windows persons,
> > >>>
> > >>>    I'm debugging the 64-bit VM in the context of Perf on win64.
> I've just noticed that we don't appear to be building a UNICODE VM?!?!  I
> rind no define of UNICODE in the win32 platform files and no define of
> UNICODE on the compiler command line in the build.winXXxYY/common
> makefiles.  This is surely a mistake, isn't it?  If this is a regression,
> when and why did this occur?  I'm going to add -DUNICODE=1 to the
> Makefile.msvc makefiles.  AFAICT this should also happen with the Cygwin
> Makefiles.  But before I make changes that could affect lots of people I
> thought I should ask here.  Anyone know when and why we dropped -DUNICODE
> from the Cygwin build command line?
> > >>>
> > >>> _,,,^..^,,,_
> > >>> best, Eliot
> > >>>
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200509/010b7e88/attachment.html>


More information about the Vm-dev mailing list