[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Levente Uzonyi leves at elte.hu
Wed May 12 00:12:16 UTC 2010


On Wed, 12 May 2010, Hannes Hirzel wrote:

> Levente, your answer covers an earlier state of the exchange. See here
> for the latest account
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html

Sorry, I didn't read all the mails before I replied.


Levente

>
> Basically the need for UFT8 conversion in my case stems from the fact
> that I use the WebClient to post the JSON object and it accepts only
> bytes. And I want to post to a couchDB which deals nicely with UTF8.
>
> The JSON package as such needs no UTF8 conversion. Only escaping of
> backslash \, double quote " and control characters.
>
>
> The method
> String >>jsonWriteOn: aStream
>
> should stay at
>
> String
>>  jsonWriteOn: aStream
>>
>>        | replacement |
>>        aStream nextPut: $".
>>        self do: [ :ch |
>>                (replacement := Json escapeForCharacter: ch)
>>                        ifNil: [ aStream nextPut: ch ]
>>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>>        aStream nextPut: $".
>
> but the method
>  Json escapeForCharacter: ch
> does not need to go for \uNNNN for non-ASCII characters.
>
> So I do the UFT8 conversion just before Http posting.
>
> I hope this clarified the situation and we might move soon to an
> update of the JSON package.
>
> --Hannes
>
> On 5/11/10, Levente Uzonyi <leves at elte.hu> wrote:
>> On Tue, 11 May 2010, Hannes Hirzel wrote:
>>
>>> 1) UFT8 conversion
>>> 2) Change to JSON package of Tony Garnock-Jones
>>> 3) My updated Test case
>>> 4) Conclusion
>>>
>>>
>>> 1) UFT8 conversion
>>>
>>> My question was:
>>>    How do I convert a WideString to UTF8?
>>>
>>>
>>> Levente answered:
>>>
>>> There are various possibilities:
>>> 'äbc' squeakToUtf8.
>>> 'äbc' convertToEncoding: 'utf-8'.
>>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>>> UTF8TextConverter new encodeString: 'äbc'.
>>>
>>>
>>>
>>> 2) Change to JSON package of Tony Garnock-Jones
>>>
>>> As CouchDB stores UTF8 values I did not want to escape them with
>>> \uNNNN as the forked JSON package in SCouchDB does. But instead I
>>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>>> is not correct in the original JSON package.
>>>
>>> So I did the following correction.
>>>
>>> In the class
>>>  String  - category *JSON-writing
>>>  (from package http://www.squeaksource.com/JSON)
>>> I replaced
>>>
>>>  jsonWriteOn: aStream
>>> 	| replacement |
>>> 	aStream nextPut: $".
>>> 	self do: [ :ch |
>>> 		(replacement := Json escapeForCharacter: ch)    "***"
>>> 			ifNil: [ aStream nextPut: ch ]
>>> 			ifNotNil: [ aStream nextPutAll: replacement ] ].
>>> 	aStream nextPut: $".
>>>
>>>
>>> WITH
>>>
>>>  jsonWriteOn: aStream
>>> 	aStream nextPut: $".
>>> 	aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>>> 	aStream nextPut: $".
>>
>> This is just wrong. According to http://json.org a string can contain any
>> unicode character except for \ " and control characters. So here should be
>> no UTF-8 conversion.
>>
>> You only need to convert the characters to UTF-8, because you're sending
>> them over the network to a server, and unicode characters have to be
>> converted to bytes someway. So the JSON printer shouldn't do any
>> conversion by default except for escaping. The only problem is that
>> escaping is not done as the spec requires it, but that's easy to fix.
>>
>>
>> Levente
>>
>>>
>>>
>>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>>> http://www.squeaksource.com/JSON
>>> and is corrected by Rado in the SCouchDB fork of the package JSON
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>>
>>>
>>>
>>> 3) My updated Test case
>>>
>>> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
>>> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
>>> myWideString.
>>> r := WriteStream on: String new.
>>> (JsonObject newFrom: d) jsonWriteOn: r.
>>> WebClient httpPut: host, '/notes/test24' content: r contents type:
>>> 'text/plain'.
>>>
>>> RESULT: OK.
>>>
>>>
>>>
>>> 4) Conclusion
>>>
>>> With the change to the JSON package I am now fine in using WebClient
>>> for storing objects in a couchdB.
>>>
>>> However I did not commit my change to
>>>  http://www.squeaksource.com/JSON
>>> as I do not (yet) understand the full impact of it.
>>>
>>>
>>> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>>>
>>> --Hannes
>>>
>>> On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
>>>> On 11 May 2010 17:44, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>>>>> On 5/10/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>>>>>
>>>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>>>> characters properly - the correct escaped form is \uNNNN - always
>>>>>> padded
>>>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>>>
>>>>>> rado
>>>>>
>>>>> I have been using
>>>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>>>> in combination with WebClient.
>>>>>
>>>>> Thank you Rado, I found
>>>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>>>> and will have a look at it.
>>>>> (Your comment: added handling of utf8 encoded input data - this is
>>>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>>>> not \uNNNN encoded)
>>>>>
>>>> SCouchDB using a forked version of JSON package, which you can find in
>>>> SCouchDB repository
>>>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>>>
>>>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>>>
>>>>
>>>>> --Hannes
>>>>>
>>>>>
>>>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>>>
>>>>>>> The test case made simpler
>>>>>>>
>>>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>>>
>>>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>>>> JSON"}
>>>>>>> '
>>>>>>>
>>>>>>> whereas
>>>>>>>
>>>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>>>> type: 'text/plain'.
>>>>>>>
>>>>>>> gives back
>>>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>>>> '
>>>>>>>
>>>>>>> --Hannes
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>>
>>>
>>>
>
>


More information about the Squeak-dev mailing list