I just did a quick search on the web and it seems like ASN.1 has a UTF8String type (with tag 12) that just contains the sequence of bytes of the UTF-8-encoded string. Can you use that? See also this question on stackoverflow: https://stackoverflow.com/q/28929809
In Squeak, you can convert between UTF-8-encoded byte strings and decoded (Squeak-encoded) character strings with the help of UTF8TextConverter. Have a look at its class-side methods. Also, there are conversion methods in String, IIRC. Try to filter its instance-side methods by "utf8".
Does this answer your question or are you in search of something else?
Kind regards, Jakob
Am 18.09.2017 03:49 schrieb "Alan Pinch" alan.c.pinch@gmail.com:
I am trying to map utf8 into an ASN1 encoding, where the UTF8 is specified to perhaps extend past one byte in value. I am also interested in retaining this UTF8 characters in squeak to interoperate well. What would be my best approach to this, mapping to/from these bytes on a stream?
Alan
I had found the same stackover flow question. It is the only place I found that mentions that 0x0C is the tag for it.
I am currently encoding thus:
aString squeakToUtf8 asByteArray.
and decoding:
bytes asByteArray asString utf8ToSqueak.
Do you think this lays out the bytes as specified in this page? I gather from the stackoverflow that this would be the encoded form of utf8 for asn1.
https://en.wikipedia.org/wiki/UTF-8#Description
Alan
On 09/18/2017 01:46 AM, Jakob Reschke wrote:
I just did a quick search on the web and it seems like ASN.1 has a UTF8String type (with tag 12) that just contains the sequence of bytes of the UTF-8-encoded string. Can you use that? See also this question on stackoverflow: https://stackoverflow.com/q/28929809 https://stackoverflow.com/q/28929809
In Squeak, you can convert between UTF-8-encoded byte strings and decoded (Squeak-encoded) character strings with the help of UTF8TextConverter. Have a look at its class-side methods. Also, there are conversion methods in String, IIRC. Try to filter its instance-side methods by "utf8".
Does this answer your question or are you in search of something else?
Kind regards, Jakob
Am 18.09.2017 03:49 schrieb "Alan Pinch" <alan.c.pinch@gmail.com mailto:alan.c.pinch@gmail.com>:
I am trying to map utf8 into an ASN1 encoding, where the UTF8 is specified to perhaps extend past one byte in value. I am also interested in retaining this UTF8 characters in squeak to interoperate well. What would be my best approach to this, mapping to/from these bytes on a stream? Alan
squeak-dev@lists.squeakfoundation.org