[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Hannes Hirzel hannes.hirzel at gmail.com
Wed May 12 00:09:12 UTC 2010


Levente, your answer covers an earlier state of the exchange. See here
for the latest account
http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html

Basically the need for UFT8 conversion in my case stems from the fact
that I use the WebClient to post the JSON object and it accepts only
bytes. And I want to post to a couchDB which deals nicely with UTF8.

The JSON package as such needs no UTF8 conversion. Only escaping of
backslash \, double quote " and control characters.


The method
String >>jsonWriteOn: aStream

should stay at

String
>  jsonWriteOn: aStream
>
>        | replacement |
>        aStream nextPut: $".
>        self do: [ :ch |
>                (replacement := Json escapeForCharacter: ch)
>                        ifNil: [ aStream nextPut: ch ]
>                        ifNotNil: [ aStream nextPutAll: replacement ] ].
>        aStream nextPut: $".

but the method
  Json escapeForCharacter: ch
does not need to go for \uNNNN for non-ASCII characters.

So I do the UFT8 conversion just before Http posting.

I hope this clarified the situation and we might move soon to an
update of the JSON package.

--Hannes

On 5/11/10, Levente Uzonyi <leves at elte.hu> wrote:
> On Tue, 11 May 2010, Hannes Hirzel wrote:
>
>> 1) UFT8 conversion
>> 2) Change to JSON package of Tony Garnock-Jones
>> 3) My updated Test case
>> 4) Conclusion
>>
>>
>> 1) UFT8 conversion
>>
>> My question was:
>>    How do I convert a WideString to UTF8?
>>
>>
>> Levente answered:
>>
>> There are various possibilities:
>> 'äbc' squeakToUtf8.
>> 'äbc' convertToEncoding: 'utf-8'.
>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>> UTF8TextConverter new encodeString: 'äbc'.
>>
>>
>>
>> 2) Change to JSON package of Tony Garnock-Jones
>>
>> As CouchDB stores UTF8 values I did not want to escape them with
>> \uNNNN as the forked JSON package in SCouchDB does. But instead I
>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>> is not correct in the original JSON package.
>>
>> So I did the following correction.
>>
>> In the class
>>  String  - category *JSON-writing
>>  (from package http://www.squeaksource.com/JSON)
>> I replaced
>>
>>  jsonWriteOn: aStream
>> 	| replacement |
>> 	aStream nextPut: $".
>> 	self do: [ :ch |
>> 		(replacement := Json escapeForCharacter: ch)    "***"
>> 			ifNil: [ aStream nextPut: ch ]
>> 			ifNotNil: [ aStream nextPutAll: replacement ] ].
>> 	aStream nextPut: $".
>>
>>
>> WITH
>>
>>  jsonWriteOn: aStream
>> 	aStream nextPut: $".
>> 	aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
>> 	aStream nextPut: $".
>
> This is just wrong. According to http://json.org a string can contain any
> unicode character except for \ " and control characters. So here should be
> no UTF-8 conversion.
>
> You only need to convert the characters to UTF-8, because you're sending
> them over the network to a server, and unicode characters have to be
> converted to bytes someway. So the JSON printer shouldn't do any
> conversion by default except for escaping. The only problem is that
> escaping is not done as the spec requires it, but that's easy to fix.
>
>
> Levente
>
>>
>>
>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>> http://www.squeaksource.com/JSON
>> and is corrected by Rado in the SCouchDB fork of the package JSON
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>
>>
>>
>> 3) My updated Test case
>>
>> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
>> d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
>> myWideString.
>> r := WriteStream on: String new.
>> (JsonObject newFrom: d) jsonWriteOn: r.
>> WebClient httpPut: host, '/notes/test24' content: r contents type:
>> 'text/plain'.
>>
>> RESULT: OK.
>>
>>
>>
>> 4) Conclusion
>>
>> With the change to the JSON package I am now fine in using WebClient
>> for storing objects in a couchdB.
>>
>> However I did not commit my change to
>>  http://www.squeaksource.com/JSON
>> as I do not (yet) understand the full impact of it.
>>
>>
>> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>>
>> --Hannes
>>
>> On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
>>> On 11 May 2010 17:44, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>>>> On 5/10/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>>>>
>>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>>> characters properly - the correct escaped form is \uNNNN - always
>>>>> padded
>>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>>
>>>>> rado
>>>>
>>>> I have been using
>>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>>> in combination with WebClient.
>>>>
>>>> Thank you Rado, I found
>>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>>> and will have a look at it.
>>>> (Your comment: added handling of utf8 encoded input data - this is
>>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>>> not \uNNNN encoded)
>>>>
>>> SCouchDB using a forked version of JSON package, which you can find in
>>> SCouchDB repository
>>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>>
>>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>>
>>>
>>>> --Hannes
>>>>
>>>>
>>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>>
>>>>>> The test case made simpler
>>>>>>
>>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>>
>>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>>> JSON"}
>>>>>> '
>>>>>>
>>>>>> whereas
>>>>>>
>>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>>> type: 'text/plain'.
>>>>>>
>>>>>> gives back
>>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>>> '
>>>>>>
>>>>>> --Hannes
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>>
>>
>>



More information about the Squeak-dev mailing list