[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Igor Stasenko siguctua at gmail.com
Tue May 11 21:30:50 UTC 2010


On 12 May 2010 00:09, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
> 1) UFT8 conversion
> 2) Change to JSON package of Tony Garnock-Jones
> 3) My updated Test case
> 4) Conclusion
>
>
> 1) UFT8 conversion
>
> My question was:
>    How do I convert a WideString to UTF8?
>
>
> Levente answered:
>
> There are various possibilities:
> 'äbc' squeakToUtf8.
> 'äbc' convertToEncoding: 'utf-8'.
> 'äbc' convertToWithConverter: UTF8TextConverter new.
> UTF8TextConverter new encodeString: 'äbc'.
>
>
>
> 2) Change to JSON package of Tony Garnock-Jones
>
> As CouchDB stores UTF8 values I did not want to escape them with
> \uNNNN as the forked JSON package in SCouchDB does.

i know. But JSON could be used for something else, and also its a part
of syntax,
so it should be supported there.

> But instead I
> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
> is not correct in the original JSON package.
>
Yeah.. SCouchDB having no utf-8 support for output. Yet.

> So I did the following correction.
>
> In the class
>  String  - category *JSON-writing
>  (from package http://www.squeaksource.com/JSON)
> I replaced
>
>  jsonWriteOn: aStream
>        | replacement |
>        aStream nextPut: $".
>        self do: [ :ch |
>                (replacement := Json escapeForCharacter: ch)    "***"
>                        ifNil: [ aStream nextPut: ch ]
>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>        aStream nextPut: $".
>
>
> WITH
>
>  jsonWriteOn: aStream
>        aStream nextPut: $".
>        aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>        aStream nextPut: $".
>

No, this is WRONG!

Json writer methods should output a unicode text, and do not deal with
any encoding!
Then, a layer which responsible for transferring the data will be free
decide how to encode the
json output, either using utf-8 encoding or any other appropriate UTF encoding.

By putting utf-8 conversions in JSON library routines you limiting
JSON library to be used only with utf-8 encoding.

I repeat: JSON library is wrong place for dealing with encodings. It
should take a unicode text/stream as input
and unicode text/stream as output. Any encodings should be up to the
outer layers, which responsible for data transmission!


>
> "*** NOTE: escapeForCharacter is incorrectly implemented in
> http://www.squeaksource.com/JSON
> and is corrected by Rado in the SCouchDB fork of the package JSON
> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>


>
>
> 3) My updated Test case
>
> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
> myWideString.
> r := WriteStream on: String new.
> (JsonObject newFrom: d) jsonWriteOn: r.
> WebClient httpPut: host, '/notes/test24' content: r contents type: 'text/plain'.
>
> RESULT: OK.
>
>
>
> 4) Conclusion
>
> With the change to the JSON package I am now fine in using WebClient
> for storing objects in a couchdB.
>
> However I did not commit my change to
>  http://www.squeaksource.com/JSON
> as I do not (yet) understand the full impact of it.
>
>
> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>
> --Hannes
>
> On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
>> On 11 May 2010 17:44, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>>> On 5/10/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>>>
>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>> characters properly - the correct escaped form is \uNNNN - always padded
>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>
>>>> rado
>>>
>>> I have been using
>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>> in combination with WebClient.
>>>
>>> Thank you Rado, I found
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>> and will have a look at it.
>>> (Your comment: added handling of utf8 encoded input data - this is
>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>> not \uNNNN encoded)
>>>
>> SCouchDB using a forked version of JSON package, which you can find in
>> SCouchDB repository
>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>
>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>
>>
>>> --Hannes
>>>
>>>
>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>
>>>>> The test case made simpler
>>>>>
>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>
>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>> JSON"}
>>>>> '
>>>>>
>>>>> whereas
>>>>>
>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>> type: 'text/plain'.
>>>>>
>>>>> gives back
>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>> '
>>>>>
>>>>> --Hannes
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>>
>
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list