Magma read performance

Fri Nov 26 18:43:13 UTC 2010

It's a proprietary data-aggregation tool that permutes every
combination of data-attributes of an input-source at multiple levels.
Even small inputs into the system can lead to very large output
graphs.  It's a signature test-application for Magma; building and
accessing a fairly large and complex data-structure and something I
think would be very difficult to do, at least abstractly, with an
RDBMS..

On Thu, Nov 25, 2010 at 6:56 AM, Facundo Vozzi <facundov79 at gmail.com> wrote:
> Hi Chris,
> what kind of application is that? At work I'm working on a ten years old
> production system with Oracle DBMS and it's 1 GB of size.
>
> Thanks for share your analysis again,
> Facu
> On Wed, Nov 24, 2010 at 2:38 PM, Chris Muller <ma.chris.m at gmail.com> wrote:
>>
>> There has been interest in Magma's read and query performance lately,
>> so I thought I would share results of a recent benchmark test.
>>
>> It's an actual application which does very heavy reading and writing
>> to a Magma repository.  This test was 24GB of repository work, over
>> two days.  My goal was to determine, once the persistent model becomes
>> many times the size of RAM and HD access appeared to become the
>> limiting factor:
>>
>>  - how fast is Magma, in terms of "objects per second" and in
>> "kilobytes per second"?
>>  - how fast is this relative to the speed when the repository was empty?
>>
>> Conclusion:
>>
>>  - Magma started at 4K objects per second (empty repository), 282K per
>> second.
>>  - Finished with 2.3K objects per second (6GB repository), 750K per
>> second.
>>  - Memory consumption by the image never exceeded 300MB.
>>
>> It is important to note, these times are from the client MagmaSession
>> point-of-view, including the full server roundtrip plus
>> materialization.  Also, as can be seen from the attached data, there
>> were many requests which only brought back one or two objects which,
>> while dramatically lowering the overall reported throughput, is
>> actually a real-world scenario for applications.
>>
>> Verbose description:
>>
>> When the test first started the repository was tiny, and there were
>> only ~4K server requests during a 5-minute monitored period.  The
>> total number of objects read across all ~4K requests was ~17M for an
>> average throughput of 4K objects per second (ops).  As the model grew
>> in size and complexity, more and more objects during the 5-minute
>> monitored intervals were required to perform application-posting of
>> the same number of input records; doubling to ~8K server requests
>> during a 5-minute period, however only a few more objects brought back
>> (total 21M), for an average of 2500 ops.  32 hours after that, the
>> rate of reads dropped to about 2300 ops.  This is due to two factors:
>>
>>  - the objects were larger (e.g., more pointers to other objects)
>>  - they were less clustered (having been replaced with objects which
>> could not be co-located with the original object)
>>
>>
>>
>> The fact that objects got bigger (e.g., more pointers to other
>> objects) is corroborated by another stat, "the number of *bytes* per
>> second" read off the HD in order to access those objects.  Initially,
>> when the test first started, there were about 4K *requests* for
>> objects in one of the first five-minute periods, which were read at a
>> rate of 282K per second.  24 hours later, there were only 2 times as
>> many requests for objects, but were read-and-materialized at a rate of
>> 750K bytes / second.
>>
>>
>>  - Chris
>>
>> _______________________________________________
>> Magma mailing list
>> Magma at lists.squeakfoundation.org
>> http://lists.squeakfoundation.org/mailman/listinfo/magma
>>
>
>
> _______________________________________________
> Magma mailing list
> Magma at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
>