I had found the same stackover flow question. It is the only place I found that mentions that 0x0C is the tag for it.

I am currently encoding thus:

aString squeakToUtf8 asByteArray.

and decoding:

bytes asByteArray asString utf8ToSqueak.

Do you think this lays out the bytes as specified in this page? I gather from the stackoverflow that this would be the encoded form of utf8 for asn1.

https://en.wikipedia.org/wiki/UTF-8#Description

Alan

On 09/18/2017 01:46 AM, Jakob Reschke wrote:
I just did a quick search on the web and it seems like ASN.1 has a UTF8String type (with tag 12) that just contains the sequence of bytes of the UTF-8-encoded string. Can you use that? See also this question on stackoverflow: https://stackoverflow.com/q/28929809

In Squeak, you can convert between UTF-8-encoded byte strings and decoded (Squeak-encoded) character strings with the help of UTF8TextConverter. Have a look at its class-side methods. Also, there are conversion methods in String, IIRC. Try to filter its instance-side methods by "utf8".

Does this answer your question or are you in search of something else?

Kind regards,
Jakob

Am 18.09.2017 03:49 schrieb "Alan Pinch" <alan.c.pinch@gmail.com>:
I am trying to map utf8 into an ASN1 encoding, where the UTF8 is
specified to perhaps extend past one byte in value. I am also interested
in retaining this UTF8 characters in squeak to interoperate well. What
would be my best approach to this, mapping to/from these bytes on a stream?

Alan