[squeak-dev] WebClient, Json and CouchDB

Levente Uzonyi leves at elte.hu
Thu May 13 01:02:18 UTC 2010


On Thu, 13 May 2010, Hannes Hirzel wrote:

> On 5/12/10, radoslav hodnicak <rh at 4096.sk> wrote:
>>
>>
>> On Wed, 12 May 2010, Hannes Hirzel wrote:
>>
>>> My question:
>>>
>>> Is there a nicer way of doing
>>>
>>> nnnn := ((c asciiValue bitAnd: 16rFFFF) printStringBase: 16) .
>>> 						[nnnn size < 4] whileTrue: [nnnn := '0', nnnn].
>>> 			 	^ '\u', nnnn
>>>
>>
>> Yes there is. As I said before, check the JSON package in the SCouchDB
>> repository (Igor's link from few days ago), where I fixed this bug. I'm
>> kinda surprised at your insistence to use a buggy/unmaintained JSON code
>> when you have been told several times there's one that's tested to work
>> with CouchDB (I use it in production).
>>
>> rado
>>
>>
>
>
>
> Hello Rado
>
> Yes, your version of the method is nicer
> escapeForCharacter: c
>
> 	| index |
> 	^ (index := c asciiValue + 1) <= escapeArray size
> 		ifTrue: [ ^ escapeArray at: index ]
>
>
> 		"THIS IS WROOONG!!! unicode is not 16bit wide!"
> 		ifFalse: [ ^ '\u', (((c asciiValue bitAnd: 16rFFFF) printStringBase:
> 16) padded: #left to: 4 with: $0) ]
>
> However your comment leads me to the non-urgent question: How would we
> deal with a code point >65536?

Noone has to deal with those, since all characters that must be 
escaped fit into 16 bits (you can find the escaping rule in RFC 4627 if 
you're interested). So this implementation is wrong, because it's 
trying to escape everything which asciiValue is greater than 127 and 
will fail for values greater than 65535. This escaping is totally 
unnecessary, it just gives a (not so) nice slowdown.

From RFC 4627:
"
    ... All Unicode characters may be placed within the
    quotation marks except for the characters that must be escaped:
    quotation mark, reverse solidus, and the control characters (U+0000
    through U+001F).

    Any character may be escaped. ...
"

So the best to do is: escape only $\ $" and the characters from 0 to 31.


Levente

>
> Thank you for insisting that I check out your copy of the JSON package
> which you maintain in the SCouchDB project. The surprise on my side is
> that you went for creating a copy instead of putting your changes into
> the JSON project as it is open for everybody to write. Your copy is
> actually pretty hidden whereas the general JSON package is easy to
> find.
>
> I went through all the changes you and Igor did in the SCouchDB
> project and decided to fold part of them back to the JSON package
> http://www.squeaksource.org/JSON .
>
> I documented it on the wiki page which goes along with the JSON
> project. I copy it in below.***
>
>
> So the updated test case for working with WebClient, JSON and the
> couchDB is the following.
>
> |json couchDBurl |
>  json := JsonObject new.
>  json title: 'The title of my note card'.
>  json body: 'The body test text of my note card with some Unicode
> test characters ',
>                   (8450 asCharacter asString, 'ä.', Character cr).
>
> "Note: JsonObject behaves like a JavaScript object insofar that you
> can add properties to instances without the necessity that they have
> been declared as instance variables. But you might just as well use
> JsonObject like a Dictionary instead as it is a subclass of
> Dictionary."
>
> "create couchDB instance"
> couchDBurl := 'http://localhost:5984/notes'.
>
> WebClient httpPut: couchDBurl
>                 content: ''
>                 type: 'text/plain'.
>
> "Store first document"
> WebClient httpPut: couchDBurl, '/myNote1'
>                 content: json asJsonString
>                 type: 'text/plain'.
>
> "You get the document back with"
>
> WebClient httpGet: couchDBurl, '/myNote1' .
>
> So far so good. This solution however still escapes code points > 127.
> See a note on this below and more on this in an upcoming post.
>
> Regards
>
> Hannes
>
>
> ----------------------------------------------------------------------------------------------------------
>
> ***
> JSON-hjh.32
>
> Author: Hannes Hirzel
>
> Ancestors: JSON-rh.31
>
> In the project SCouchDB a copy of JSON is maintained by Igor Stasenko
> and Radoslav Hodnicak.
>
> This merges part of the changes back, in particular
>
> SCouchDB project
>
>    * JSON-Igor.Stasenko.28
>    * JSON-Igor.Stasenko.29
>    * JSON-rh.30
>    * JSON-rh.31
>
> Main changes
>
>   1. JsonObject is now a subclass of Dictionary instead of Object. So
> there is no need to implement the Dictionary interface.
>   2. Fix for converting Unicode characters to \uNNNN format (missing
> padding to 4 characters)
>
> No further changes
>
> The SCouchDB project contains more changes in the copy of the JSON package.
>
> I did not go further in merging because in SCouchDB / JSON-rh.32
> Radoslav Hodnicak introduces an instance variable 'converter'
>
> which is initialized to
>
> converter := UTF8TextConverter new
>
> Igor Stasenko, Levente Uzonyi and Hannes Hirzel agreed that the UTF8
> conversion does not belong into the JSON package
>
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-May/150497.html
>
> Levente Uzonyi:
>
> You only need to convert the characters to UTF-8, because you're
> sending them over the network to a server, and Unicode characters have
> to be converted to bytes someway. So the JSON printer shouldn't do any
> conversion by default except for escaping. The only problem is that
> escaping is not done as the spec requires it, but that's easy to fix.
>
> http://www.json.org/
>
> A string is a collection of zero or more Unicode characters, wrapped
> in double quotes, using backslash escapes. A character is represented
> as a single character string. A string is very much like a C or Java
> string.
> About escaping Unicode characters
>
> Actually escaping Unicode characters to
>
> \uNNNN
>
> is not necessary for characters with codes >127 in case of an upload
> to a CouchDB. But this version does it.
>
> In case you want to patch this change method
>
> Json class escapeForCharacter: c
>
>


More information about the Squeak-dev mailing list