[squeak-dev] database vs flat files (was: FileStreams Limit)

Chris Muller ma.chris.m at gmail.com
Sun Feb 20 20:43:54 UTC 2022


Hi Jörg,

That sounds like an interesting project.  One thing that stands out to
me in your description is that it focuses only on the recording and
capture of the data, but not any requirements w.r.t. *accessing* that
recorded data in the future.  For the capture, it sounds like you have
the most performant implementation possible -- no database will be
able to keep up with dumping to flat files in the manner you
described.

> Is there a possibility in Magma, that I can change multiple objects over time, but defer the commit action?

Yes, of course.

> Everything I understand so far is, that I need to encapsulate my change operations into a
> commit block,

Not necessarily.  Magma allows several methods of committing, "bulk
load" applications like yours would want to use #3 as described on
this page, which does not use a commit block:

   https://wiki.squeak.org/squeak/5605

> where each object change is then tracked.

"Tracked" is a misnomer unless using Magma's built-in WriteBarrier
option (e.g., set "mySession allowWriteBarrier: true" right before or
after connecting the Session), which is strongly recommended for
bulk-load use-cases, because it can dramatically increase commit
performance.

Otherwise, when using #allowWriteBarrier: false, Magma *compares*
every object to its previously-known state, as read from the database
(but, it sounds like you're not really reading from the DB, just
committing new objects).  In that mode, nothing is "tracked" during
domain processing as it is with using WriteBarrier.

> This looks like Glorp to me.

Nope.  There are no similarities with Glorp.  Magma is closer to
Gemtalk (formerly known as GemStone).

> In VisualWorks I have implemented a single-user database systems based on the immutable-flags. It looks to me that Squeak has currently not this feature to have immutable objects, I could not find a method like #isImmutable like in VisualWorks. With that mechanism you can track object changes and later you need simply to send a #commit to your session.

Squeak added immutable objects in the last few releases.  This might
be useful for specific vertical applications like yours, however, I
believe this mechanism is problematic to use as a means of
changed-detection for a general-purpose database like Magma.  There've
been several discussions about this in the past.

> But I think it should be possible in Magma to have something like this:
>
> session trackChanges: [session root at: 1 put: #change].
> session trackChanges: [session root at: 2 put: #change].
>> session commit

It certainly is, however, it would be unnecessarily inefficient.
First, accessing the #root is always a DB hit, so there's no need to
keep doing that.  Second, you would want to use this sort of approach
instead of commit blocks:

  "bootstrap / init"
  session allowWriteBarrier: true; connectAs: 'loader'; begin.
  mySignals := session root.
  signalToLoad := mySignals at: signalName ifAbsentPut: [ MagmaArray new ].
  ...
  "load loop"
  [ moreDataToProcess ] whileTrue:
      [ signalToLoad add: incomingData.
      (Time millisecondsSince: lastCommit) > 30000 "milliseconds" ifTrue:
            [ session commitAndBegin.
            lastCommit := Time millisecondClockValue ] ].
   ...

Without knowing more, this is going to be the fastest way to bulk-load
a Magma DB with your data, but it would be only a fraction of dumping
to flat files, of course.

> The advantage of my files is of course I can simply remove older fragment files from the signal directory, zip it and put it somewhere else as backup and clean up the database a bit to make it smaller in the runtime. But I will have a look at that what you described as „browsing the magma database“ :-)
>
> Ahh and the other advantage of my files is I can use it directly in my Python scripts to read it in. If I use a Magma database I need an exporter.

Yup.  Your project sounds like it'd be a fun exercise for Magma, but
if your eventual consumption of the data you're loading can satisfy
your requirements with only single-user batch processing, you don't
really need a database at all.   :)

 - Chris


More information about the Squeak-dev mailing list