I am trying to map utf8 into an ASN1 encoding, where the UTF8 is specified to perhaps extend past one byte in value. I am also interested in retaining this UTF8 characters in squeak to interoperate well. What would be my best approach to this, mapping to/from these bytes on a stream?
Alan
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
I will explore, thank you for your thoughts. I tell myself you have to better, yet the task list is long and somewhat disorganized. And the are only so many seconds each decade.
- Alan
On Sep 18, 2017, at 12:29, tim Rowledge tim@rowledge.org wrote:
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
I think the pieces are there I merely need to figure out the correct ordering.
- Alan
On Sep 18, 2017, at 12:43, Alan Pinch alan.c.pinch@gmail.com wrote:
I will explore, thank you for your thoughts. I tell myself you have to better, yet the task list is long and somewhat disorganized. And the are only so many seconds each decade.
- Alan
On Sep 18, 2017, at 12:29, tim Rowledge tim@rowledge.org wrote:
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
Here is the encode and decode code I am using and a test that does not test UTF8 extended encoding. I need ASN1 bytes with non-trivial charachers and a baseline. Would anyone have some interesting utf8 bytes handy?
ASN1UTF8StringType
#encodeValue: anObject withDERStream: derStream
derStream nextPutAll: anObject squeakToUtf8 asByteArray
#decodeValueWithDERStream: derStream length: length
^ (derStream next: length) asByteArray asString utf8ToSqueak.
CryptoASN1Test>>#testConstructedUTF8String
| bytes obj testObj | bytes := #(44 15 12 5 84 101 115 116 32 12 6 85 115 101 114 32 49). testObj := 'Test User 1'. obj := ASN1InputStream decodeBytes: bytes. self assert: (obj = testObj).
Thank you for your consideration, Alan
On 09/18/2017 12:29 PM, tim Rowledge wrote:
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
I got BigIntegers working in java ASN1
https://github.com/ZiroZimbarra/callistohouse
On 09/18/2017 12:29 PM, tim Rowledge wrote:
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
I asked a question on stackoverflow, regarding UTC Time in java conversions.
https://stackoverflow.com/questions/46419082/java-conversion-from-to-asn1-da...
I thought you may like to know.
On 09/25/2017 06:09 PM, Alan Pinch wrote:
I got BigIntegers working in java ASN1
https://github.com/ZiroZimbarra/callistohouse
On 09/18/2017 12:29 PM, tim Rowledge wrote:
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
To share my good news! I just got a port of Cryptography's ASN1 to Java passing tests. Now to get PhaseHeaders encoding right to bring bit-compatible encryption between Squeak and Java online.
Almost 50% more code in Java than squeak, just saying we have a concrete example of the efficacy of squeak over Java. They should have left it as the Oak Project and called it a day. Our day comes.
On 09/26/2017 05:08 PM, Alan Pinch wrote:
I asked a question on stackoverflow, regarding UTC Time in java conversions.
https://stackoverflow.com/questions/46419082/java-conversion-from-to-asn1-da...
I thought you may like to know.
On 09/25/2017 06:09 PM, Alan Pinch wrote:
I got BigIntegers working in java ASN1
https://github.com/ZiroZimbarra/callistohouse
On 09/18/2017 12:29 PM, tim Rowledge wrote:
We do have assorted string encoding stuff in the current image but the actual UTF8 results of #squeakToUtf8 (for example) are just ByteStrings. Which is actually rather confusing and annoying because now you have no way to know what encoding is relevant other than be carefully keeping track manually. Normally of course, within the image we have perfectly usable strings because any time a unicode character that is outside the 1-byte range is used the string becomes a WideString.
We need to do better. Look at TextEncoder and its hierarchy for more info.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLBM: Ruin Logic Board Multiple
squeak-dev@lists.squeakfoundation.org