[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Igor Stasenko siguctua at gmail.com
Tue May 11 22:30:39 UTC 2010


On 12 May 2010 01:12, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
> Igor, your argument convinces me.
> Thank you for the quick feedback.
>
> see updates below
>
> --Hannes
>
> On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
>> On 12 May 2010 00:09, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>>> 1) UFT8 conversion
>>> 2) Change to JSON package of Tony Garnock-Jones
>>> 3) My updated Test case
>>> 4) Conclusion
>>>
>>>
>>> 1) UFT8 conversion
>>>
>>> My question was:
>>>    How do I convert a WideString to UTF8?
>>>
>>>
>>> Levente answered:
>>>
>>> There are various possibilities:
>>> 'äbc' squeakToUtf8.
>>> 'äbc' convertToEncoding: 'utf-8'.
>>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>>> UTF8TextConverter new encodeString: 'äbc'.
>>>
>>>
>>>
>>> 2) Change to JSON package of Tony Garnock-Jones
>>>
>>> As CouchDB stores UTF8 values I did not want to escape them with
>>> \uNNNN as the forked JSON package in SCouchDB does.
>>
>> i know. But JSON could be used for something else, and also its a part
>> of syntax,
>> so it should be supported there.
>>
>>> But instead I
>>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>>> is not correct in the original JSON package.
>>>
>> Yeah.. SCouchDB having no utf-8 support for output. Yet.
>>
>>> So I did the following correction.
>>>
>>> In the class
>>>  String  - category *JSON-writing
>>>  (from package http://www.squeaksource.com/JSON)
>>> I replaced
>>>
>>>  jsonWriteOn: aStream
>>>        | replacement |
>>>        aStream nextPut: $".
>>>        self do: [ :ch |
>>>                (replacement := Json escapeForCharacter: ch)    "***"
>>>                        ifNil: [ aStream nextPut: ch ]
>>>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>>>        aStream nextPut: $".
>>>
>>>
>>> WITH
>>>
>>>  jsonWriteOn: aStream
>>>        aStream nextPut: $".
>>>        aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>>>        aStream nextPut: $".
>>>
>>
>> No, this is WRONG!
>>
>> Json writer methods should output a unicode text, and do not deal with
>> any encoding!
>> Then, a layer which responsible for transferring the data will be free
>> decide how to encode the
>> json output, either using utf-8 encoding or any other appropriate UTF
>> encoding.
>>
>> By putting utf-8 conversions in JSON library routines you limiting
>> JSON library to be used only with utf-8 encoding.
>>
>> I repeat: JSON library is wrong place for dealing with encodings. It
>> should take a unicode text/stream as input
>> and unicode text/stream as output. Any encodings should be up to the
>> outer layers, which responsible for data transmission!
>
> So String>> jsonWriteOn:aStream
>
> is now just
>
> jsonWriteOn: aStream
>       aStream nextPut: $".
>       aStream nextPutAll:  self.
>       aStream nextPut: $".
>
>
>
>>>
>>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>>> http://www.squeaksource.com/JSON
>>> and is corrected by Rado in the SCouchDB fork of the package JSON
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>>
>>
>>
>>>
>>>
>>> 3) My updated Test case
>>>
>
> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
> myWideString.
> r := WriteStream on: String new.
> (JsonObject newFrom: d) jsonWriteOn: r.
> WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter
> new encodeString: r contents) type: 'text/plain'.
>
>
> RESULT: OK.
>
>
>>> 4) Conclusion
>>>
>>> With the change to the JSON package I am now fine in using WebClient
>>> for storing objects in a couchdB.
>>>
>
> However I did not commit my change to
> http://www.squeaksource.com/JSON
>
> though
>   Json escapeForCharacter: ch
> is wrong.
>
Yeah, thanks for noting that. This probably should be simply wiped out.
Or, maybe we could be more clever and add an option, whether we want
to escape a non-ascii characters or not.
This can be done by adding a single method to stream, which could tell
if it can deal with unicode , or
only with ascii characters.

> And probably it should not do it. At least the current couchDB deals
> properly with UTF8 encoded strings.
>
>
In SCouchDB i will put an encoding layer right before sending json (in
similar way as you used in the example above).
Its easy to do, given the assumption, that JSON output is _always_ a
unicode text,
then i can simply use an appropriate utf-8 encoder, which will encode
it while sending to server.
And thus, no extra effort is required in JSON itself.


-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list