Why would one prefer CouchDB or TokyoT/C over plain old SQL? (was
Re: [squeak-dev] Re: Squeak packages for accessing SQL databases and
for report generation)
Göran Krampe
goran at krampe.se
Sat Mar 28 10:06:24 UTC 2009
Hi!
(I almost missed this one, sorry)
Andreas Raab wrote:
> Göran Krampe wrote:
>> Secodnly - the original question was about SQL and how to access
>> legacy stuff I think, but there are several new interesting database
>> alternatives around that is worth mentioning. I have toyed with two of
>> them lately:
>
> Interesting. When would you prefer either of them over a plain old SQL
> database? I'm not familiar with either CouchDB or
> TokyoCabinet/TokyoTyrant but I'd be interested to find out more about
> their application areas.
Let me give you a quick take on this rather large subject :). First, a
summary of my little efforts on both these products:
- I toyed with CouchDB, there was already a Curl-based API at SS for it.
I also implemented a "view server" in Squeak for it, haven't released it
yet, should do that. I track it, it moves. It's hip.
- I recently started playing with TT/TC and have built a Squeak API for
it, I just threw it up on SM and yesterday I posted a lengthy blog
article about it in fact:
http://goran.krampe.se/blog/Squeak/TokyoTyrant.rdoc
History:
- Most of these new dbs have been built as responses to pragmatic needs
to scale a LOT. TT/TC comes from Mixi.jp ("Facebook of Japan"). Then you
have a whole list of these things from Amazon (Dynamo - closed), Google
(BigTable), Facebook (Cassandra) etc etc.
- CouchDB is also built to scale like crazy, but started as a single man
hobby project. It is one of the few projects with a really strong
developer community since it was NOT built inside a company. Built in
Erlang as are MANY of these new dbs.
Back to the reasons why one would prefer them (any of these), my take on it:
1. Peformance/hardware ratio. Most of these are variants of "key-value
stores" or "document centric dbs". They focus a lot on speed. As you can
see in my blog entry TT/TC is awfully fast, well, I haven't compared yet
to say PGSQL, but I can't imagine doing 2000-3000 inserts/sec stuffing
18Mb/sec into an SQL db on this little mini laptop of mine. I really
hope I am not lying through my teeth. :)
2. Avoid the ORM/"impedance mismatch" swamp. CouchDB is a key-value
store which stores JSON objects (a "document" in their lingo). Thus it
can store/load object graphs/hierarchies in one "clump" quite easily -
so in some respect these databases are similar to OODBs IMHO. TT/TC
stores "more or less" binary blobs (unless you use table extension).
3. Dynamics. Both CouchDB and TT/TC (using table extension) talk about a
"schema less" model. This translates to the fact that CouchDB can store
*any* JSON object, there is no schema. And using map/reduce you can
still work with "views" on them etc. TT/TC using table extension more or
less stores a Dictionary as value: "<key> $00 <value> $00 <key2> $00
<value2>". And then it has support for adding indexes on these keys, a
query engine, a Lua scripting extension inside TT to do "stored
procedures"-stuff etc. But key here is the fact that these databases are
made to deal with a changing world and does not rely on strict schemas
nor advanced types, and when you have it running on say 100 servers
these aspects seem to become very important (not talking from
experience, just from what I hear in these forums).
4. Scale horisontally. A lot. CouchDB aims at mega-scaling using
replication and multi-version logic - "eventual consistency". It also
implements the map/reduce pattern where you can define map and reduce
functions in JS running on the server in a so called "view server".
TT/TC also has replication, dual-master failover,
single-master-multi-readers etc. It does not have mega-scaling goals as
CouchDB has, but there are already "layers on top" like LightCloud that
forms a hash-ring of TT/TC servers for scaling. And since it is so darn
fast on a single box it covers a lot of use cases without large scaling.
From a more personal "touchy feely" perspective these things (and
several others like Dynomite) are a fresh air! They are very simple to
use. They are FAST. They are robust. They are small. They often embrace
the "web 2.0" world by using JSON, HTTP-REST APIs, memcached protocol
etc etc.
For the moment I am focusing on TT/TC but CouchDB has some very
interesting things going for it - like Erlang OTP, promised transparent
replication and its map/reduce stuff. And the CouchDB API on SS seems to
work fine if you get the Curl plugin.
Well, that turned into a long post, but hopefully I answered some of it.
For more details read my blog article! :) I also plan to write another
soon about the table extension and its Lua mechanisms.
regards, Göran
More information about the Squeak-dev
mailing list
|