Why would one prefer CouchDB or TokyoT/C over plain old SQL? (was Re: [squeak-dev] Re: Squeak packages for accessing SQL databases and for report generation)

Göran Krampe goran at krampe.se
Sat Mar 28 10:06:24 UTC 2009


Hi!

(I almost missed this one, sorry)

Andreas Raab wrote:
> Göran Krampe wrote:
>> Secodnly - the original question was about SQL and how to access 
>> legacy stuff I think, but there are several new interesting database 
>> alternatives around that is worth mentioning. I have toyed with two of 
>> them lately:
> 
> Interesting. When would you prefer either of them over a plain old SQL 
> database? I'm not familiar with either CouchDB or 
> TokyoCabinet/TokyoTyrant but I'd be interested to find out more about 
> their application areas.

Let me give you a quick take on this rather large subject :). First, a 
summary of my little efforts on both these products:

- I toyed with CouchDB, there was already a Curl-based API at SS for it. 
I also implemented a "view server" in Squeak for it, haven't released it 
yet, should do that. I track it, it moves. It's hip.

- I recently started playing with TT/TC and have built a Squeak API for 
it, I just threw it up on SM and yesterday I posted a lengthy blog 
article about it in fact:

	http://goran.krampe.se/blog/Squeak/TokyoTyrant.rdoc


History:

- Most of these new dbs have been built as responses to pragmatic needs 
to scale a LOT. TT/TC comes from Mixi.jp ("Facebook of Japan"). Then you 
have a whole list of these things from Amazon (Dynamo - closed), Google 
(BigTable), Facebook (Cassandra) etc etc.

- CouchDB is also built to scale like crazy, but started as a single man 
hobby project. It is one of the few projects with a really strong 
developer community since it was NOT built inside a company. Built in 
Erlang as are MANY of these new dbs.

Back to the reasons why one would prefer them (any of these), my take on it:

1. Peformance/hardware ratio. Most of these are variants of "key-value 
stores" or "document centric dbs". They focus a lot on speed. As you can 
see in my blog entry TT/TC is awfully fast, well, I haven't compared yet 
to say PGSQL, but I can't imagine doing 2000-3000 inserts/sec stuffing 
18Mb/sec into an SQL db on this little mini laptop of mine. I really 
hope I am not lying through my teeth. :)

2. Avoid the ORM/"impedance mismatch" swamp. CouchDB is a key-value 
store which stores JSON objects (a "document" in their lingo). Thus it 
can store/load object graphs/hierarchies in one "clump" quite easily - 
so in some respect these databases are similar to OODBs IMHO. TT/TC 
stores "more or less" binary blobs (unless you use table extension).

3. Dynamics. Both CouchDB and TT/TC (using table extension) talk about a 
"schema less" model. This translates to the fact that CouchDB can store 
*any* JSON object, there is no schema. And using map/reduce you can 
still work with "views" on them etc. TT/TC using table extension more or 
less stores a Dictionary as value: "<key> $00 <value> $00 <key2> $00 
<value2>". And then it has support for adding indexes on these keys, a 
query engine, a Lua scripting extension inside TT to do "stored 
procedures"-stuff etc. But key here is the fact that these databases are 
made to deal with a changing world and does not rely on strict schemas 
nor advanced types, and when you have it running on say 100 servers 
these aspects seem to become very important (not talking from 
experience, just from what I hear in these forums).

4. Scale horisontally. A lot. CouchDB aims at mega-scaling using 
replication and multi-version logic - "eventual consistency". It also 
implements the map/reduce pattern where you can define map and reduce 
functions in JS running on the server in a so called "view server". 
TT/TC also has replication, dual-master failover, 
single-master-multi-readers etc. It does not have mega-scaling goals as 
CouchDB has, but there are already "layers on top" like LightCloud that 
forms a hash-ring of TT/TC servers for scaling. And since it is so darn 
fast on a single box it covers a lot of use cases without large scaling.

 From a more personal "touchy feely" perspective these things (and 
several others like Dynomite) are a fresh air! They are very simple to 
use. They are FAST. They are robust. They are small. They often embrace 
the "web 2.0" world by using JSON, HTTP-REST APIs, memcached protocol 
etc etc.

For the moment I am focusing on TT/TC but CouchDB has some very 
interesting things going for it - like Erlang OTP, promised transparent 
replication and its map/reduce stuff. And the CouchDB API on SS seems to 
work fine if you get the Curl plugin.

Well, that turned into a long post, but hopefully I answered some of it. 
For more details read my blog article! :) I also plan to write another 
soon about the table extension and its Lua mechanisms.

regards, Göran




More information about the Squeak-dev mailing list