[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and
WebServer 1.0 for Squeak)
Hannes Hirzel
hannes.hirzel at gmail.com
Tue May 11 21:09:50 UTC 2010
1) UFT8 conversion
2) Change to JSON package of Tony Garnock-Jones
3) My updated Test case
4) Conclusion
1) UFT8 conversion
My question was:
How do I convert a WideString to UTF8?
Levente answered:
There are various possibilities:
'äbc' squeakToUtf8.
'äbc' convertToEncoding: 'utf-8'.
'äbc' convertToWithConverter: UTF8TextConverter new.
UTF8TextConverter new encodeString: 'äbc'.
2) Change to JSON package of Tony Garnock-Jones
As CouchDB stores UTF8 values I did not want to escape them with
\uNNNN as the forked JSON package in SCouchDB does. But instead I
wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
is not correct in the original JSON package.
So I did the following correction.
In the class
String - category *JSON-writing
(from package http://www.squeaksource.com/JSON)
I replaced
jsonWriteOn: aStream
| replacement |
aStream nextPut: $".
self do: [ :ch |
(replacement := Json escapeForCharacter: ch) "***"
ifNil: [ aStream nextPut: ch ]
ifNotNil: [ aStream nextPutAll: replacement ] ].
aStream nextPut: $".
WITH
jsonWriteOn: aStream
aStream nextPut: $".
aStream nextPutAll: (UTF8TextConverter new encodeString: self).
aStream nextPut: $".
"*** NOTE: escapeForCharacter is incorrectly implemented in
http://www.squeaksource.com/JSON
and is corrected by Rado in the SCouchDB fork of the package JSON
http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
3) My updated Test case
myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put:
myWideString.
r := WriteStream on: String new.
(JsonObject newFrom: d) jsonWriteOn: r.
WebClient httpPut: host, '/notes/test24' content: r contents type: 'text/plain'.
RESULT: OK.
4) Conclusion
With the change to the JSON package I am now fine in using WebClient
for storing objects in a couchdB.
However I did not commit my change to
http://www.squeaksource.com/JSON
as I do not (yet) understand the full impact of it.
Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help
--Hannes
On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
> On 11 May 2010 17:44, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>> On 5/10/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>>
>>> Which JSON package/version are you using? I fixed a bug in the one
>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>> characters properly - the correct escaped form is \uNNNN - always padded
>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>
>>> rado
>>
>> I have been using
>> http://www.squeaksource.com/JSON (over 7000 downloads)
>> in combination with WebClient.
>>
>> Thank you Rado, I found
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>> and will have a look at it.
>> (Your comment: added handling of utf8 encoded input data - this is
>> necessary for couchdb-lucene which sends results directly in utf8 and
>> not \uNNNN encoded)
>>
> SCouchDB using a forked version of JSON package, which you can find in
> SCouchDB repository
> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>
> If you looking for that method, it can be found in Json>>unescapeUnicode
>
>
>> --Hannes
>>
>>
>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>
>>>> The test case made simpler
>>>>
>>>> WebClient httpPut: host, '/notes/test7' content:
>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>
>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>> JSON"}
>>>> '
>>>>
>>>> whereas
>>>>
>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>> type: 'text/plain'.
>>>>
>>>> gives back
>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>> '
>>>>
>>>> --Hannes
>>>
>>>
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>
More information about the Squeak-dev
mailing list
|