[squeak-dev] Windows mapping from CF_UNICODE to Unicode
Tobias Pape
Das.Linux at gmx.de
Fri Nov 18 22:05:23 UTC 2022
> On 18. Nov 2022, at 22:52, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
> Hi All,
>
> does anyone know how Windows maps Unicode text to the CF_UNICODE format used in the clipboard? It seems to me that CF_UNICODE might simply be two-byte characters, excluding any codes beyond 16rFFFF. Is it in fact UTF-16?
Windows being windows, this ought to be UTF-16. When MS adopted Unicode in the 90s, it was still "small" enough for 16Bit,
and was, in fact, UCS2. It got "upgraded" to UTF-16 around Windows 2000.
See: https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
NOTE: UTF-16 has a lot of fun with "surrogate pairs", which makes it possible to have the whole UCS4-spectrum of code points.
This is a lot messy, and surrogate pairs are invalid UTF-8, go figure.
Sidenode: This is the reason, why https://simonsapin.github.io/wtf-8/ exists.
Best regards
-Tobias
>
> If it is UTF-16 has anyone fixed our UTF16TextConverter? I don't see any of the conveniences that exist for the UTF8TextConverter such as decodeString: etc.
> _,,,^..^,,,_
> best, Eliot
>
More information about the Squeak-dev
mailing list
|