[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and
WebServer 1.0 for Squeak)
Igor Stasenko
siguctua at gmail.com
Tue May 11 21:30:50 UTC 2010
On 12 May 2010 00:09, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
> 1) UFT8 conversion
> 2) Change to JSON package of Tony Garnock-Jones
> 3) My updated Test case
> 4) Conclusion
>
>
> 1) UFT8 conversion
>
> My question was:
> How do I convert a WideString to UTF8?
>
>
> Levente answered:
>
> There are various possibilities:
> 'äbc' squeakToUtf8.
> 'äbc' convertToEncoding: 'utf-8'.
> 'äbc' convertToWithConverter: UTF8TextConverter new.
> UTF8TextConverter new encodeString: 'äbc'.
>
>
>
> 2) Change to JSON package of Tony Garnock-Jones
>
> As CouchDB stores UTF8 values I did not want to escape them with
> \uNNNN as the forked JSON package in SCouchDB does.
i know. But JSON could be used for something else, and also its a part
of syntax,
so it should be supported there.
> But instead I
> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
> is not correct in the original JSON package.
>
Yeah.. SCouchDB having no utf-8 support for output. Yet.
> So I did the following correction.
>
> In the class
> String - category *JSON-writing
> (from package http://www.squeaksource.com/JSON)
> I replaced
>
> jsonWriteOn: aStream
> | replacement |
> aStream nextPut: $".
> self do: [ :ch |
> (replacement := Json escapeForCharacter: ch) "***"
> ifNil: [ aStream nextPut: ch ]
> ifNotNil: [ aStream nextPutAll: replacement ] ].
> aStream nextPut: $".
>
>
> WITH
>
> jsonWriteOn: aStream
> aStream nextPut: $".
> aStream nextPutAll: (UTF8TextConverter new encodeString: self).
> aStream nextPut: $".
>
No, this is WRONG!
Json writer methods should output a unicode text, and do not deal with
any encoding!
Then, a layer which responsible for transferring the data will be free
decide how to encode the
json output, either using utf-8 encoding or any other appropriate UTF encoding.
By putting utf-8 conversions in JSON library routines you limiting
JSON library to be used only with utf-8 encoding.
I repeat: JSON library is wrong place for dealing with encodings. It
should take a unicode text/stream as input
and unicode text/stream as output. Any encodings should be up to the
outer layers, which responsible for data transmission!
>
> "*** NOTE: escapeForCharacter is incorrectly implemented in
> http://www.squeaksource.com/JSON
> and is corrected by Rado in the SCouchDB fork of the package JSON
> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>
>
>
> 3) My updated Test case
>
> myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
> d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put:
> myWideString.
> r := WriteStream on: String new.
> (JsonObject newFrom: d) jsonWriteOn: r.
> WebClient httpPut: host, '/notes/test24' content: r contents type: 'text/plain'.
>
> RESULT: OK.
>
>
>
> 4) Conclusion
>
> With the change to the JSON package I am now fine in using WebClient
> for storing objects in a couchdB.
>
> However I did not commit my change to
> http://www.squeaksource.com/JSON
> as I do not (yet) understand the full impact of it.
>
>
> Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
>
> --Hannes
>
> On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
>> On 11 May 2010 17:44, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>>> On 5/10/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>>>
>>>> Which JSON package/version are you using? I fixed a bug in the one
>>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>>> characters properly - the correct escaped form is \uNNNN - always padded
>>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>>
>>>> rado
>>>
>>> I have been using
>>> http://www.squeaksource.com/JSON (over 7000 downloads)
>>> in combination with WebClient.
>>>
>>> Thank you Rado, I found
>>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>>> and will have a look at it.
>>> (Your comment: added handling of utf8 encoded input data - this is
>>> necessary for couchdb-lucene which sends results directly in utf8 and
>>> not \uNNNN encoded)
>>>
>> SCouchDB using a forked version of JSON package, which you can find in
>> SCouchDB repository
>> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>>
>> If you looking for that method, it can be found in Json>>unescapeUnicode
>>
>>
>>> --Hannes
>>>
>>>
>>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>>
>>>>> The test case made simpler
>>>>>
>>>>> WebClient httpPut: host, '/notes/test7' content:
>>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>>
>>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>>> JSON"}
>>>>> '
>>>>>
>>>>> whereas
>>>>>
>>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>>> type: 'text/plain'.
>>>>>
>>>>> gives back
>>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>>> '
>>>>>
>>>>> --Hannes
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>>
>
>
--
Best regards,
Igor Stasenko AKA sig.
More information about the Squeak-dev
mailing list
|