[squeak-dev] Re: UTF8 in JSON (was: Re: [ANN] WebClient and
WebServer 1.0 for Squeak)
Hannes Hirzel
hannes.hirzel at gmail.com
Tue May 11 22:12:59 UTC 2010
Igor, your argument convinces me.
Thank you for the quick feedback.
see updates below
--Hannes
On 5/11/10, Igor Stasenko <siguctua at gmail.com> wrote:
> On 12 May 2010 00:09, Hannes Hirzel <hannes.hirzel at gmail.com> wrote:
>> 1) UFT8 conversion
>> 2) Change to JSON package of Tony Garnock-Jones
>> 3) My updated Test case
>> 4) Conclusion
>>
>>
>> 1) UFT8 conversion
>>
>> My question was:
>> How do I convert a WideString to UTF8?
>>
>>
>> Levente answered:
>>
>> There are various possibilities:
>> 'äbc' squeakToUtf8.
>> 'äbc' convertToEncoding: 'utf-8'.
>> 'äbc' convertToWithConverter: UTF8TextConverter new.
>> UTF8TextConverter new encodeString: 'äbc'.
>>
>>
>>
>> 2) Change to JSON package of Tony Garnock-Jones
>>
>> As CouchDB stores UTF8 values I did not want to escape them with
>> \uNNNN as the forked JSON package in SCouchDB does.
>
> i know. But JSON could be used for something else, and also its a part
> of syntax,
> so it should be supported there.
>
>> But instead I
>> wanted to keep UTF8 in the db. As Rado pointed out the UFT8 conversion
>> is not correct in the original JSON package.
>>
> Yeah.. SCouchDB having no utf-8 support for output. Yet.
>
>> So I did the following correction.
>>
>> In the class
>> String - category *JSON-writing
>> (from package http://www.squeaksource.com/JSON)
>> I replaced
>>
>> jsonWriteOn: aStream
>> | replacement |
>> aStream nextPut: $".
>> self do: [ :ch |
>> (replacement := Json escapeForCharacter: ch) "***"
>> ifNil: [ aStream nextPut: ch ]
>> ifNotNil: [ aStream nextPutAll: replacement ] ].
>> aStream nextPut: $".
>>
>>
>> WITH
>>
>> jsonWriteOn: aStream
>> aStream nextPut: $".
>> aStream nextPutAll: (UTF8TextConverter new encodeString: self).
>> aStream nextPut: $".
>>
>
> No, this is WRONG!
>
> Json writer methods should output a unicode text, and do not deal with
> any encoding!
> Then, a layer which responsible for transferring the data will be free
> decide how to encode the
> json output, either using utf-8 encoding or any other appropriate UTF
> encoding.
>
> By putting utf-8 conversions in JSON library routines you limiting
> JSON library to be used only with utf-8 encoding.
>
> I repeat: JSON library is wrong place for dealing with encodings. It
> should take a unicode text/stream as input
> and unicode text/stream as output. Any encodings should be up to the
> outer layers, which responsible for data transmission!
So String>> jsonWriteOn:aStream
is now just
jsonWriteOn: aStream
aStream nextPut: $".
aStream nextPutAll: self.
aStream nextPut: $".
>>
>> "*** NOTE: escapeForCharacter is incorrectly implemented in
>> http://www.squeaksource.com/JSON
>> and is corrected by Rado in the SCouchDB fork of the package JSON
>> http://www.squeaksource.com/SCouchDB/SCouchDB-Core-rh.8.mcz"
>>
>
>
>>
>>
>> 3) My updated Test case
>>
myWideString := ('ä', 8220 asCharacter asString, Character cr, 'b').
d := Dictionary new. d at: 'title' put: 'aTitle'. d at: 'body' put:
myWideString.
r := WriteStream on: String new.
(JsonObject newFrom: d) jsonWriteOn: r.
WebClient httpPut: host, '/notes/test25' content: (UTF8TextConverter
new encodeString: r contents) type: 'text/plain'.
RESULT: OK.
>> 4) Conclusion
>>
>> With the change to the JSON package I am now fine in using WebClient
>> for storing objects in a couchdB.
>>
However I did not commit my change to
http://www.squeaksource.com/JSON
though
Json escapeForCharacter: ch
is wrong.
And probably it should not do it. At least the current couchDB deals
properly with UTF8 encoded strings.
More information about the Squeak-dev
mailing list
|