[squeak-dev] ASN1 encoding of UTF8

Jakob Reschke jakob.reschke at student.hpi.de
Wed Sep 20 20:33:59 UTC 2017


2017-09-20 21:42 GMT+02:00 Alan Pinch <alan.c.pinch at gmail.com>:
> Would anyone have some interesting utf8 bytes
> handy?

http://xahlee.info/comp/unicode_drawing_shapes.html
:-)

🐭
┃┣━━━┳━━━━┳━━┓
┃┗┓┏┛┃╻╺━━┛╺┓┃
┣┓┃┗┓┗┻━━━┳╸┃┃
┃┃┣╸┣━━┳━┓┗━┛┃
┃┃┃┏┛┏╸┃╻┣━┳╸┃
┃┗━┫╻┣━━┫┗╸┃┏┫
┃┏━┫┃┃╺┓┗━┓┃┃┃
┃┃┃┃┃┗┓┗━┓┗┻╸┃
┗━┫┏┻━━━━┻━━━┛

Can be copy&pasted into a workspace. You only get to see question
marks, but the character values are correct.

theAbove squeakToUtf8 asByteArray
=>  #[226 148 131 226 148 163 226 148 129 226 148 129 226 148 129 226
148 179 226 148 129 226 148 129 226 148 129 226 148 129 226 148 179
226 148 129 226 148 129 226 148 147 13 226 148 131 226 148 151 226 148
147 226 148 143 226 148 155 226 148 131 226 149 187 226 149 186 226
148 129 226 148 129 226 148 155 226 149 186 226 148 147 226 148 131 13
226 148 163 226 148 147 226 148 131 226 148 151 226 148 147 226 148
151 226 148 187 226 148 129 226 148 129 226 148 129 226 148 179 226
149 184 226 148 131 226 148 131 13 226 148 131 226 148 131 226 148 163
226 149 184 226 148 163 226 148 129 226 148 129 226 148 179 226 148
129 226 148 147 226 148 151 226 148 129 226 148 155 226 148 131 13 226
148 131 226 148 131 226 148 131 226 148 143 226 148 155 226 148 143
226 149 184 226 148 131 226 149 187 226 148 163 226 148 129 226 148
179 226 149 184 226 148 131 13 226 148 131 226 148 151 226 148 129 226
148 171 226 149 187 226 148 163 226 148 129 226 148 129 226 148 171
226 148 151 226 149 184 226 148 131 226 148 143 226 148 171 13 226 148
131 226 148 143 226 148 129 226 148 171 226 148 131 226 148 131 226
149 186 226 148 147 226 148 151 226 148 129 226 148 147 226 148 131
226 148 131 226 148 131 13 226 148 131 226 148 131 226 148 131 226 148
131 226 148 131 226 148 151 226 148 147 226 148 151 226 148 129 226
148 147 226 148 151 226 148 187 226 149 184 226 148 131 13 226 148 151
226 148 129 226 148 171 226 148 143 226 148 187 226 148 129 226 148
129 226 148 129 226 148 129 226 148 187 226 148 129 226 148 129 226
148 129 226 148 155]

Alternatively, you could try some pseudo-German pseudo-names like:
'Bjârn-ThaddÀus Düngerstraß' squeakToUtf8 asByteArray
=> #[66 106 195 182 114 110 45 84 104 97 100 100 195 164 117 115 32 68
195 188 110 103 101 114 115 116 114 97 195 159].

>
> ASN1UTF8StringType
>
>  >>#encodeValue: anObject withDERStream: derStream
>
>      derStream nextPutAll: anObject squeakToUtf8 asByteArray
>
>  >>#decodeValueWithDERStream: derStream length: length
>
>      ^ (derStream next: length) asByteArray asString utf8ToSqueak.
>
> CryptoASN1Test>>#testConstructedUTF8String
>
>     | bytes obj testObj |
>      bytes := #(44 15 12 5 84 101 115 116 32 12 6 85 115 101 114 32 49).
>      testObj := 'Test User 1'.
>      obj := ASN1InputStream decodeBytes: bytes.
>      self assert: (obj = testObj).
>
> Thank you for your consideration,
> Alan
>
>
> On 09/18/2017 12:29 PM, tim Rowledge wrote:
>> We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
>>
>> We need to do better. Look at TextEncoder and its hierarchy for more info.
>>
>> tim
>> --
>> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
>> Strange OpCodes: RLBM: Ruin Logic Board Multiple
>>
>>
>>
>
>


More information about the Squeak-dev mailing list