ASN1 encoding of UTF8

List overview All Threads
Download

newer

older

The Inbox:...

PostscriptCanvas working again?

Alan Pinch

18 Sep 2017 18 Sep '17

3:49 a.m.

I am trying to map utf8 into an ASN1 encoding, where the UTF8 is specified to perhaps extend past one byte in value. I am also interested in retaining this UTF8 characters in squeak to interoperate well. What would be my best approach to this, mapping to/from these bytes on a stream?

Alan

Show replies by date

tim Rowledge

18 Sep 18 Sep

6:29 p.m.

We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

Alan Pinch

6:43 p.m.

I will explore, thank you for your thoughts. I tell myself you have to better, yet the task list is long and somewhat disorganized. And the are only so many seconds each decade.

- Alan

...

On Sep 18, 2017, at 12:29, tim Rowledge tim@rowledge.org wrote:

We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

Alan Pinch

7:26 p.m.

I think the pieces are there I merely need to figure out the correct ordering.

- Alan

...

On Sep 18, 2017, at 12:43, Alan Pinch alan.c.pinch@gmail.com wrote:

I will explore, thank you for your thoughts. I tell myself you have to better, yet the task list is long and somewhat disorganized. And the are only so many seconds each decade.

Alan

...
On Sep 18, 2017, at 12:29, tim Rowledge tim@rowledge.org wrote:

We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

Alan Pinch

20 Sep 20 Sep

9:42 p.m.

Here is the encode and decode code I am using and a test that does not test UTF8 extended encoding. I need ASN1 bytes with non-trivial charachers and a baseline. Would anyone have some interesting utf8 bytes handy?

ASN1UTF8StringType

...

...
#encodeValue: anObject withDERStream: derStream

derStream nextPutAll: anObject squeakToUtf8 asByteArray

...

...
#decodeValueWithDERStream: derStream length: length

^ (derStream next: length) asByteArray asString utf8ToSqueak.

CryptoASN1Test>>#testConstructedUTF8String

| bytes obj testObj | bytes := #(44 15 12 5 84 101 115 116 32 12 6 85 115 101 114 32 49). testObj := 'Test User 1'. obj := ASN1InputStream decodeBytes: bytes. self assert: (obj = testObj).

Thank you for your consideration, Alan

On 09/18/2017 12:29 PM, tim Rowledge wrote:

...

We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

Alan Pinch

26 Sep 26 Sep

12:09 a.m.

I got BigIntegers working in java ASN1

https://github.com/ZiroZimbarra/callistohouse

On 09/18/2017 12:29 PM, tim Rowledge wrote:

...

We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

-- Thank you for your consideration, Alan

Alan Pinch

11:08 p.m.

I asked a question on stackoverflow, regarding UTC Time in java conversions.

https://stackoverflow.com/questions/46419082/java-conversion-from-to-asn1-da...

I thought you may like to know.

On 09/25/2017 06:09 PM, Alan Pinch wrote:

...

I got BigIntegers working in java ASN1

https://github.com/ZiroZimbarra/callistohouse

On 09/18/2017 12:29 PM, tim Rowledge wrote:

...
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

-- Thank you for your consideration, Alan

Alan Pinch

30 Sep 30 Sep

1:04 a.m.

To share my good news! I just got a port of Cryptography's ASN1 to Java passing tests. Now to get PhaseHeaders encoding right to bring bit-compatible encryption between Squeak and Java online.

Almost 50% more code in Java than squeak, just saying we have a concrete example of the efficacy of squeak over Java. They should have left it as the Oak Project and called it a day. Our day comes.

On 09/26/2017 05:08 PM, Alan Pinch wrote:

...

I asked a question on stackoverflow, regarding UTC Time in java conversions.

https://stackoverflow.com/questions/46419082/java-conversion-from-to-asn1-da...

I thought you may like to know.

On 09/25/2017 06:09 PM, Alan Pinch wrote:

...
I got BigIntegers working in java ASN1

https://github.com/ZiroZimbarra/callistohouse

On 09/18/2017 12:29 PM, tim Rowledge wrote:

...
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.

We need to do better. Look at TextEncoder and its hierarchy for more info.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple

-- Thank you for your consideration, Alan

2424

Age (days ago)

2435

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

7 comments

2 participants

tags (0)

participants (2)

Alan Pinch
tim Rowledge