<body><div id="__MailbirdStyleContent" style="font-size: 10pt;font-family: Arial;color: #000000">
I also saw several uses of TCHAR that might get upset being suddenly replaced with wchar_t instead of char.<div><br></div><div>Best,</div><div>Marcel</div><div class="mb_sig"></div><blockquote class="history_container" type="cite" style="border-left-style:solid;border-width:1px; margin-top:20px; margin-left:0px;padding-left:10px;">
<p style="color: #AAAAAA; margin-top: 10px;">Am 09.05.2020 10:29:18 schrieb Nicolas Cellier <nicolas.cellier.aka.nice@gmail.com>:</p><div style="font-family:Arial,Helvetica,sans-serif"> Le ven. 8 mai 2020 à 09:55, Tobias Pape <das.linux@gmx.de> a écrit :
<br>
<br>>
<br>>
<br>> > On 08.05.2020, at 09:52, Nicolas Cellier <><br>> nicolas.cellier.aka.nice@gmail.com> wrote:
<br>> >
<br>> > +1
<br>> >
<br>> > I think we already have done most of the work to enable this...
<br>> minheadless is UNICODE already.
<br>>
<br>> I remember breaking my head when trying in my windows branches.
<br>> at least the ssl stuff should work ;)
<br>>
<br>> -t
<br>>
<br>> and all the hickups around July 2016 shows that converting legacy code is
<br>paved with traps:
<br>https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/6437fb3788a373d9ed733b9c2e23c500fed98505
<br>This was good work, sorry for making you revert that, but it had to be
<br>finished/polished...
<br>
<br>If we go to UNICODE, we must check:
<br>- if the primitives or plugin are implicitely using the W variant
<br>- how the string arguments are interpreted (converted from some encoding to
<br>UTF16? passed directly as if UTF16?)
<br>- what the image is passing to the primitve (this must be done in
<br>Squeak/Cuis and maybe Pharo)
<br>
<br>The strategy we have chosen so far is to
<br>- encode most strings passed to the underlying OS as UTF8
<br>- let the primitives convert to UTF16 on win32
<br>- let the primitive explicitely invoke the W variant
<br>
<br>Similarly, strings returned are converted back to UTF8.
<br>This is making like more simple at image side, because platform independent
<br>(in the spirit of a VM).
<br>
<br>We could as well have chosen a platform specific format for passing strings
<br>(UTF16 would be the choice on win32), and let image delegate the encoding
<br>to OSPlatform current, or something like that.
<br>
<br>From VM perspective, we could offer a set of functions for interpreting
<br>input Smallalk object as:
<br>- ByteString = iso8859L1 to fit the image
<br>- ByteArray = utf8 encoded bytes
<br>- DoubleByteArray = utf16 encoded bytes
<br>- WordArray (including WideString) = ucs-4
<br>and converting to various platform encodings...
<br>Most of these functions would use native capabilities of OS, so it's not a
<br>lot of work.
<br>
<br>This versatility would delegate the decision to trade the uniformity (utf8)
<br>for efficiency at image side.
<br>Thoughts?
<br>
<br>
<br>>
<br>> > Le ven. 8 mai 2020 à 09:23, Tobias Pape <das.linux@gmx.de> a écrit :
<br>> >
<br>> > Hi
<br>> >
<br>> > I think we ought to define UNICODE and _UNICODE.
<br>> > -t
<br>> >
<br>> > > On 08.05.2020, at 09:14, Marcel Taeumel <marcel.taeumel@hpi.de> wrote:
<br>> > >
<br>> > > ...on second thought: I might be better to first find all non-unicode
<br>> calls, convert them, make sure everything works, then enable that UNICODE
<br>> flag, and finally change all explicit unicode calls to "transparent" ones.
<br>> :-)
<br>> > >> Am 08.05.2020 09:03:40 schrieb Marcel Taeumel <marcel.taeumel@hpi.de></marcel.taeumel@hpi.de><br>> >:
<br>> > >>
<br>> > >> Hi Eliot,
<br>> > >>
<br>> > >> I suppose we decided to make explicit use of the unicode functions.
<br>> See https://devblogs.microsoft.com/oldnewthing/?p=40643 -- Maybe this is
<br>> from a time when our source held both ANSI and UNICODE variants. These
<br>> days, we could simplify the code by defining -DUNICODE=1 and skip using,
<br>> e.g., "GetWindowTextW" because then "GetWindowText" will automatically
<br>> choose the unicode versions.
<br>> > >>
<br>> > >> I think that that -DUNICODE has never been part of some makefile.
<br>> > >>
<br>> > >> Best,
<br>> > >> Marcel
<br>> > >>> Am 07.05.2020 23:09:23 schrieb Eliot Miranda <><br>> eliot.miranda@gmail.com>:
<br>> > >>>
<br>> > >>> Hi All, especially Windows persons,
<br>> > >>>
<br>> > >>> I'm debugging the 64-bit VM in the context of Perf on win64. I've
<br>> > >>> just noticed that we don't appear to be building a UNICODE VM?!?! I
<br>> rind
<br>> > >>> no define of UNICODE in the win32 platform files and no define of
<br>> UNICODE
<br>> > >>> on the compiler command line in the build.winXXxYY/common makefiles.
<br>> This
<br>> > >>> is surely a mistake, isn't it? If this is a regression, when and why
<br>> did
<br>> > >>> this occur? I'm going to add -DUNICODE=1 to the Makefile.msvc
<br>> makefiles.
<br>> > >>> AFAICT this should also happen with the Cygwin Makefiles. But before
<br>> I
<br>> > >>> make changes that could affect lots of people I thought I should ask
<br>> here.
<br>> > >>> Anyone know when and why we dropped -DUNICODE from the Cygwin build
<br>> command
<br>> > >>> line?
<br>> > >>>
<br>> > >>> _,,,^..^,,,_
<br>> > >>> best, Eliot
<br>> > >>> Hi All, especially Windows persons,
<br>> > >>>
<br>> > >>> I'm debugging the 64-bit VM in the context of Perf on win64.
<br>> I've just noticed that we don't appear to be building a UNICODE VM?!?! I
<br>> rind no define of UNICODE in the win32 platform files and no define of
<br>> UNICODE on the compiler command line in the build.winXXxYY/common
<br>> makefiles. This is surely a mistake, isn't it? If this is a regression,
<br>> when and why did this occur? I'm going to add -DUNICODE=1 to the
<br>> Makefile.msvc makefiles. AFAICT this should also happen with the Cygwin
<br>> Makefiles. But before I make changes that could affect lots of people I
<br>> thought I should ask here. Anyone know when and why we dropped -DUNICODE
<br>> from the Cygwin build command line?
<br>> > >>>
<br>> > >>> _,,,^..^,,,_
<br>> > >>> best, Eliot
<br>> > >>>
<br>> >
<br>> >
<br>>
<br>>
<br>>
<br><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le ven. 8 mai 2020 à 09:55, Tobias Pape <<a href="mailto:Das.Linux@gmx.de">Das.Linux@gmx.de</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
<br><br>
<br>> On 08.05.2020, at 09:52, Nicolas Cellier <<a href="mailto:nicolas.cellier.aka.nice@gmail.com" target="_blank">nicolas.cellier.aka.nice@gmail.com</a>> wrote:<br>
<br>> <br>
<br>> +1<br>
<br>> <br>
<br>> I think we already have done most of the work to enable this... minheadless is UNICODE already.<br>
<br><br>
<br>I remember breaking my head when trying in my windows branches.<br>
<br>at least the ssl stuff should work ;)<br>
<br><br>
<br>-t<br>
<br><br></blockquote><div>and all the hickups around July 2016 shows that converting legacy code is paved with traps:<br></div><div><a href="https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/6437fb3788a373d9ed733b9c2e23c500fed98505">https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/6437fb3788a373d9ed733b9c2e23c500fed98505</a></div><div>This was good work, sorry for making you revert that, but it had to be finished/polished...</div><div><br></div><div>If we go to UNICODE, we must check:</div><div>- if the primitives or plugin are implicitely using the W variant<br></div><div>- how the string arguments are interpreted (converted from some encoding to UTF16? passed directly as if UTF16?)</div><div>- what the image is passing to the primitve (this must be done in Squeak/Cuis and maybe Pharo)</div><div><br></div><div>The strategy we have chosen so far is to</div><div>- encode most strings passed to the underlying OS as UTF8</div><div>- let the primitives convert to UTF16 on win32</div><div>- let the primitive explicitely invoke the W variant</div><div><br></div><div>Similarly, strings returned are converted back to UTF8.</div><div>This is making like more simple at image side, because platform independent (in the spirit of a VM).</div><div><br></div><div>We could as well have chosen a platform specific format for passing strings (UTF16 would be the choice on win32), and let image delegate the encoding to OSPlatform current, or something like that.</div><div><br></div><div>From VM perspective, we could offer a set of functions for interpreting input Smallalk object as:</div><div>- ByteString = iso8859L1 to fit the image</div><div>- ByteArray = utf8 encoded bytes<br></div><div>- DoubleByteArray = utf16 encoded bytes<br></div><div>- WordArray (including WideString) = ucs-4</div><div>and converting to various platform encodings...</div><div>Most of these functions would use native capabilities of OS, so it's not a lot of work.</div><div><br></div><div>This versatility would delegate the decision to trade the uniformity (utf8) for efficiency at image side.</div><div>Thoughts?<br></div><div><br></div><div> <br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>> <br>
<br>> Le ven. 8 mai 2020 à 09:23, Tobias Pape <<a href="mailto:Das.Linux@gmx.de" target="_blank">Das.Linux@gmx.de</a>> a écrit :<br>
<br>> <br>
<br>> Hi<br>
<br>> <br>
<br>> I think we ought to define UNICODE and _UNICODE.<br>
<br>> -t<br>
<br>> <br>
<br>> > On 08.05.2020, at 09:14, Marcel Taeumel <<a href="mailto:marcel.taeumel@hpi.de" target="_blank">marcel.taeumel@hpi.de</a>> wrote:<br>
<br>> > <br>
<br>> > ...on second thought: I might be better to first find all non-unicode calls, convert them, make sure everything works, then enable that UNICODE flag, and finally change all explicit unicode calls to "transparent" ones. :-)<br>
<br>> >> Am 08.05.2020 09:03:40 schrieb Marcel Taeumel <<a href="mailto:marcel.taeumel@hpi.de" target="_blank">marcel.taeumel@hpi.de</a>>:<br>
<br>> >> <br>
<br>> >> Hi Eliot,<br>
<br>> >> <br>
<br>> >> I suppose we decided to make explicit use of the unicode functions. See <a href="https://devblogs.microsoft.com/oldnewthing/?p=40643" rel="noreferrer" target="_blank">https://devblogs.microsoft.com/oldnewthing/?p=40643</a> -- Maybe this is from a time when our source held both ANSI and UNICODE variants. These days, we could simplify the code by defining -DUNICODE=1 and skip using, e.g., "GetWindowTextW" because then "GetWindowText" will automatically choose the unicode versions.<br>
<br>> >> <br>
<br>> >> I think that that -DUNICODE has never been part of some makefile.<br>
<br>> >> <br>
<br>> >> Best,<br>
<br>> >> Marcel<br>
<br>> >>> Am 07.05.2020 23:09:23 schrieb Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com" target="_blank">eliot.miranda@gmail.com</a>>:<br>
<br>> >>> <br>
<br>> >>> Hi All, especially Windows persons, <br>
<br>> >>> <br>
<br>> >>> I'm debugging the 64-bit VM in the context of Perf on win64. I've <br>
<br>> >>> just noticed that we don't appear to be building a UNICODE VM?!?! I rind <br>
<br>> >>> no define of UNICODE in the win32 platform files and no define of UNICODE <br>
<br>> >>> on the compiler command line in the build.winXXxYY/common makefiles. This <br>
<br>> >>> is surely a mistake, isn't it? If this is a regression, when and why did <br>
<br>> >>> this occur? I'm going to add -DUNICODE=1 to the Makefile.msvc makefiles. <br>
<br>> >>> AFAICT this should also happen with the Cygwin Makefiles. But before I <br>
<br>> >>> make changes that could affect lots of people I thought I should ask here. <br>
<br>> >>> Anyone know when and why we dropped -DUNICODE from the Cygwin build command <br>
<br>> >>> line? <br>
<br>> >>> <br>
<br>> >>> _,,,^..^,,,_ <br>
<br>> >>> best, Eliot <br>
<br>> >>> Hi All, especially Windows persons,<br>
<br>> >>> <br>
<br>> >>> I'm debugging the 64-bit VM in the context of Perf on win64. I've just noticed that we don't appear to be building a UNICODE VM?!?! I rind no define of UNICODE in the win32 platform files and no define of UNICODE on the compiler command line in the build.winXXxYY/common makefiles. This is surely a mistake, isn't it? If this is a regression, when and why did this occur? I'm going to add -DUNICODE=1 to the Makefile.msvc makefiles. AFAICT this should also happen with the Cygwin Makefiles. But before I make changes that could affect lots of people I thought I should ask here. Anyone know when and why we dropped -DUNICODE from the Cygwin build command line?<br>
<br>> >>> <br>
<br>> >>> _,,,^..^,,,_<br>
<br>> >>> best, Eliot<br>
<br>> >>> <br>
<br>> <br>
<br>> <br>
<br><br>
<br><br>
<br></blockquote></div></div>
<br></marcel.taeumel@hpi.de></das.linux@gmx.de></das.linux@gmx.de></div></blockquote>
</div></body>