[squeak-dev] ASN1 encoding of UTF8

Alan Pinch alan.c.pinch at gmail.com
Wed Sep 20 19:42:12 UTC 2017

Here is the encode and decode code I am using and a test that does not 
test UTF8 extended encoding. I need ASN1 bytes with non-trivial 
charachers and a baseline. Would anyone have some interesting utf8 bytes 


 >>#encodeValue: anObject withDERStream: derStream

     derStream nextPutAll: anObject squeakToUtf8 asByteArray

 >>#decodeValueWithDERStream: derStream length: length

     ^ (derStream next: length) asByteArray asString utf8ToSqueak.


    | bytes obj testObj |
     bytes := #(44 15 12 5 84 101 115 116 32 12 6 85 115 101 114 32 49).
     testObj := 'Test User 1'.
     obj := ASN1InputStream decodeBytes: bytes.
     self assert: (obj = testObj).

Thank you for your consideration,

On 09/18/2017 12:29 PM, tim Rowledge wrote:
> We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
> We need to do better. Look at TextEncoder and its hierarchy for more info.
> tim
> --
> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> Strange OpCodes: RLBM: Ruin Logic Board Multiple

More information about the Squeak-dev mailing list