[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and WebServer 1.0 for Squeak)

Hannes Hirzel hannes.hirzel at gmail.com
Tue May 11 21:09:50 UTC 2010


1) UFT8 conversion
2) Change to JSON package of Tony Garnock-Jones
3) My updated Test case
4) Conclusion


1) UFT8 conversion

My question was:
    How do I convert a WideString to UTF8?


Levente answered:

There are various possibilities:
'äbc' squeakToUtf8.
'äbc' convertToEncoding: 'utf-8'.
'äbc' convertToWithConverter: UTF8TextConverter new.
UTF8TextConverter new encodeString: 'äbc'.



2) Change to JSON package of Tony Garnock-Jones

As CouchDB stores UTF8 values I did not want to escape them with
\uNNNN as the forked JSON package in SCouchDB does. But instead I
wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
is not correct in the original JSON package.

So I did the following correction.

In the class
  String  - category *JSON-writing
  (from package http://www.squeaksource.com/JSON)
I replaced

  jsonWriteOn: aStream
	| replacement |
	aStream nextPut: $".
	self do: [ :ch |
		(replacement := Json escapeForCharacter: ch)    "***"
			ifNil: [ aStream nextPut: ch ]
			ifNotNil: [ aStream nextPutAll: replacement ] ].
	aStream nextPut: $".


WITH

  jsonWriteOn: aStream
	aStream nextPut: $".
	aStream nextPutAll:  (UTF8TextConverter new encodeString: self).
	aStream nextPut: $".


"*** NOTE: escapeForCharacter is incorrectly implemented in
http://www.squeaksource.com/JSON
and is corrected by Rado in the SCouchDB fork of the package JSON
http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"



3) My updated Test case

myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
d := Dictionary new. d at: 'title' put:   'aTitle'. d at: 'body' put:
myWideString.
r := WriteStream on: String new.
(JsonObject newFrom: d) jsonWriteOn: r.
WebClient httpPut: host, '/notes/test24' content: r contents type: 'text/plain'.

RESULT: OK.



4) Conclusion

With the change to the JSON package I am now fine in using WebClient
for storing objects in a couchdB.

However I did not commit my change to
  http://www.squeaksource.com/JSON
as I do not (yet) understand the full impact of it.


Thank you Andreas Raab, Levente Uzony and Rado Hodnicak for your help

--Hannes

On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
> On 11 May 2010 17:44, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>> On 5/10/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>>
>>> Which JSON package/version are you using? I fixed a bug in the one
>>> distributed with SCouchDB few weeks ago, where it didn't encode utf8
>>> characters properly - the correct escaped form is \uNNNN - always padded
>>> to 4 Ns. that's why you get that warning, yours is only 2-3
>>>
>>> rado
>>
>> I have been using
>> http://www.squeaksource.com/JSON (over 7000 downloads)
>> in combination with WebClient.
>>
>> Thank you Rado, I found
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz
>> and will have a look at it.
>> (Your comment: added handling of utf8 encoded input data - this is
>> necessary for couchdb-lucene which sends results directly in utf8 and
>> not \uNNNN encoded)
>>
> SCouchDB using a forked version of JSON package, which you can find in
> SCouchDB repository
> http://www.squeaksource.com/SCouchDB/JSON-Igor.Stasenko.34.mcz
>
> If you looking for that method, it can be found in Json>>unescapeUnicode
>
>
>> --Hannes
>>
>>
>>> On Mon, 10 May 2010, Hannes Hirzel wrote:
>>>
>>>> The test case made simpler
>>>>
>>>> WebClient httpPut: host, '/notes/test7' content:
>>>> '{"content":"\uC3\uA4s"}' type: 'text/plain'.
>>>>
>>>> gives back as answer: '{"error":"bad_request","reason":"invalid UTF-8
>>>> JSON"}
>>>> '
>>>>
>>>> whereas
>>>>
>>>> WebClient httpPut: host, '/notes/test8' content: '{"content":"abc"}'
>>>> type: 'text/plain'.
>>>>
>>>> gives back
>>>> '{"ok":true,"id":"test8","rev":"1-f40e52919735ae6775af3d388361b3da"}
>>>> '
>>>>
>>>> --Hannes
>>>
>>>
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>



More information about the Squeak-dev mailing list