[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Hannes Hirzel hannes.hirzel at gmail.com
Tue May 11 22:12:59 UTC 2010


Igor, your argument convinces me.
Thank you for the quick feedback.

see updates below

--Hannes

On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
> On 12 May 2010 00:09, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>> 1) UFT8 conversion
>> 2) Change to JSON package of Tony Garnock-Jones
>> 3) My updated Test case
>> 4) Conclusion
>>
>>
>> 1) UFT8 conversion
>>
>> My question was:
>>    How do I convert a WideString to UTF8?
>>
>>
>> Levente answered:
>>
>> There are various possibilities:
>> 'äbc' squeakToUtf8.
>> 'äbc' convertToEncoding: 'utf-8'.
>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>> UTF8TextConverter new encodeString: 'äbc'.
>>
>>
>>
>> 2) Change to JSON package of Tony Garnock-Jones
>>
>> As CouchDB stores UTF8 values I did not want to escape them with
>> \uNNNN as the forked JSON package in SCouchDB does.
>
> i know. But JSON could be used for something else, and also its a part
> of syntax,
> so it should be supported there.
>
>> But instead I
>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>> is not correct in the original JSON package.
>>
> Yeah.. SCouchDB having no utf-8 support for output. Yet.
>
>> So I did the following correction.
>>
>> In the class
>>  String  - category *JSON-writing
>>  (from package http://www.squeaksource.com/JSON)
>> I replaced
>>
>>  jsonWriteOn: aStream
>>        | replacement |
>>        aStream nextPut: $".
>>        self do: [ :ch |
>>                (replacement := Json escapeForCharacter: ch)    "***"
>>                        ifNil: [ aStream nextPut: ch ]
>>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>>        aStream nextPut: $".
>>
>>
>> WITH
>>
>>  jsonWriteOn: aStream
>>        aStream nextPut: $".
>>        aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>>        aStream nextPut: $".
>>
>
> No, this is WRONG!
>
> Json writer methods should output a unicode text, and do not deal with
> any encoding!
> Then, a layer which responsible for transferring the data will be free
> decide how to encode the
> json output, either using utf-8 encoding or any other appropriate UTF
> encoding.
>
> By putting utf-8 conversions in JSON library routines you limiting
> JSON library to be used only with utf-8 encoding.
>
> I repeat: JSON library is wrong place for dealing with encodings. It
> should take a unicode text/stream as input
> and unicode text/stream as output. Any encodings should be up to the
> outer layers, which responsible for data transmission!

So String>> jsonWriteOn:aStream

is now just

jsonWriteOn: aStream
       aStream nextPut: $".
       aStream nextPutAll:  self.
       aStream nextPut: $".



>>
>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>> http://www.squeaksource.com/JSON
>> and is corrected by Rado in the SCouchDB fork of the package JSON
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>
>
>
>>
>>
>> 3) My updated Test case
>>

myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
myWideString.
r := WriteStream on: String new.
(JsonObject newFrom: d) jsonWriteOn: r.
WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter
new encodeString: r contents) type: 'text/plain'.


RESULT: OK.


>> 4) Conclusion
>>
>> With the change to the JSON package I am now fine in using WebClient
>> for storing objects in a couchdB.
>>

However I did not commit my change to
http://www.squeaksource.com/JSON

though
   Json escapeForCharacter: ch
is wrong.

And probably it should not do it. At least the current couchDB deals
properly with UTF8 encoded strings.



More information about the Squeak-dev mailing list