My first Magma experience (Now including GOODS)
Daniel Salama
dsalama at user.net
Sat Apr 2 22:15:14 UTC 2005
On Apr 1, 2005, at 11:05 AM, Chris Muller wrote:
>> I'll install the newer version of Magma. Also, this is the code of
>> FHKC834Header>>addMember
>>
>> addMember: anObject
>> self members
>> ifNotNil: [self members addLast: anObject].
>>
>> I'm new to Smalltalk, but my understanding is that this is simply
>> adding an object to the end of the linked list. Please confirm.
>
> Yes, and anObject is a subclass of Link, right? So the question is,
> is its
> "nextLink" already populated. I'm thinking it must be, and *its*
> nextLink and
> the next and the next..
>
> That's why the commit is picking up so many objects in your first
> commit; *I
> think*.
>
> - Chris
Chris,
Yes, anObject is subclass of Link. What I was thinking is that maybe
because Magma stores information based on reachability, if I'm trying
to store objects of a LinkedList, since every object is linked to one
another, it may be fetching/storing all objects in each commit, or
something like that. I don't know, I'm just guessing why it's "running
out" of memory.
Anyway, as I mentioned to you, I changed the collection type to
MagmaCollection instead of LinkedList. Once I made the change, the
process ran successfully. However, the performance was somewhat
disappointing. I'm not blaming Magma or Squeak. It could simply be that
because I'm new to Smalltalk and all this environment in general, my
design could be flawed.
To recap, all I'm trying to do is store an object of class
FHKC834Header as an element of an OrderedCollection in the root of a
Magma database. This object contains 5 attributes. Four of them simply
store short strings (less than 20 chars) and the fifth one stores a
MagmaCollection of 8784 FHKC834Entry objects. This FHKC834Entry class
simply stores 27 string attributes, each no longer than 30 chars.
The portion of the code that loads the data into Magma is:
head := FHKC834Header new.
head controlNumber: (self ediData controlNumber).
head referenceNumber: (self ediData referenceNumber).
head date: (self ediData date).
head time: (self ediData time).
head members: (MagmaCollection new).
time := Time millisecondsToRun:
[mySession := MagmaSession hostAddress: #(127 0 0 1) asByteArray
port: 51969.
mySession connectAs: 'dsalama'.
mySession commit:
[mySession root removeKey: 'FHKC834' ifAbsent: []].
theBase := mySession root at: 'FHKC834' ifAbsent:
[theBase := OrderedCollection new.
mySession commit: [mySession root at: 'FHKC834' put: theBase]].
mySession commit:
[theBase add: head].
Transcript cr.
count := 0.
numCommits := 0.
mySession begin.
self ediData members do:
[:ea|
(count \\ 100) = 0
ifTrue:
[mySession commitAndBegin.
Transcript show: '.'.
numCommits := numCommits + 1].
head addMember: ea.
count := count + 1].
mySession commit.
numCommits := numCommits + 1.
mySession disconnect].
Transcript cr; show: (time/1000) asFloat; show: ' seconds to load into
Magma in ';
show: numCommits; show: ' commit(s)'.
where ediData is an object of class FHKC834Header which contains all of
the information in memory. In other words, this variable stores the
result of parsing the text file which contains the 8784 records.
As you can see, all I'm doing is adding one FHKC834Header object and
then looping through all 8784 records, committing every 100 records. It
took 1 hour 13 minutes to run in a Powerbook G5 1.5Ghz. Needless to
say, it was very slow. The results from the MagmaConsole statistics at
the end of the run were:
a MagmaServerStatistics
sockets queued:
min = 1
max = 1
sum = 102
count = 102
mean : 1.0
requests queued:
min = 1
max = 1
sum = 102
count = 102
mean : 1.0
request sizes:
min = 35
max = 150440
sum = 13142912
count = 102
mean : 128852.0784313725
request processing time:
min = a MaDuration 00:00:00.000
max = a MaDuration 00:00:03.737
sum = a MaDuration 00:01:59.995
count = 101
mean : **invalid**
response sizes:
min = 4
max = 303413
sum = 26564055
count = 102
mean : 260431.9117647059
Then, I ran the following code snippet:
time := Time millisecondsToRun:
[mySession := MagmaSession hostAddress: #(127 0 0 1) asByteArray
port: 51969.
mySession connectAs: 'dsalama'.
xmt := (mySession root at: 'FHKC834') first.
numCommits := 0.
count := 0.
xmt members do:
[:ea|
(count \\ 100) = 0
ifTrue:
[mySession commit:
[ea lName: (ea lName asLowercase).
numCommits := numCommits + 1]].
count := count + 1].
mySession disconnect].
Transcript cr; show: (time/1000) asFloat; show: ' seconds to update ';
show: numCommits; show: ' record(s) from Magma'.
What I tried to do here was simply to iterate through all the
FHKC834Entry objects already stored in the database and updated and
committed one attribute every 100th object. It took 29 minutes on the
same computer. This is even worse timing as the other one. I guess I
could live with a slow bulk load process, but random updates to a large
collection shouldn't take this long. Again, the results of the
MagmaConsole after this was run were:
a MagmaServerStatistics
sockets queued:
min = 1
max = 1
sum = 356
count = 356
mean : 1.0
requests queued:
min = 1
max = 1
sum = 356
count = 356
mean : 1.0
request sizes:
min = 35
max = 1156
sum = 118494
count = 356
mean : 332.8483146067416
request processing time:
min = a MaDuration 00:00:00.000
max = a MaDuration 00:00:00.450
sum = a MaDuration 00:00:17.685
count = 355
mean : **invalid**
response sizes:
min = 4
max = 88500
sum = 2769020
count = 356
mean : 7778.14606741573
Am I doing something wrong or is it normal to expect such results (in
terms of performance)?
Then, I decided to run the tests against GOODS. Well, the results were
not much better. In the case of GOODS, I used the advise of Avi and
used BTrees and TSTrees for the collections. I assume this would have
or should have increased the performance over the standard collections.
The first code snippet I ran was:
head := FHKC834Header new.
head controlNumber: (self ediData controlNumber).
head referenceNumber: (self ediData referenceNumber).
head date: (self ediData date).
head time: (self ediData time).
head members: (TSTree new).
time := Time millisecondsToRun:
[db := KKDatabase onHost: 'localhost' port: 6100.
db root: Dictionary new.
db commit.
theBase := db root at: 'FHKC834' ifAbsent:
[db root at: 'FHKC834' put: (BTree new).
db commit.
db root at: 'FHKC834'].
theBase at: 0 put: head.
db commit.
db flush.
Transcript cr.
count := 0.
numCommits := 0.
self ediData members do:
[:ea|
(count \\ 100) = 0
ifTrue:
[Transcript show: '.'.
db commit.
numCommits := numCommits + 1].
(head members) at: (ea alternateId1) put: ea.
count := count + 1].
db commit.
db flushAll.
numCommits := numCommits + 1].
Transcript cr; show: (time/1000) asFloat; show: ' seconds to load into
GOODS in ';
show: numCommits; show: ' commit(s)'.
Again, similar to the Magma example, this piece of code is supposed to
insert the 8784 FHKC834Entry objects committing every 100th records,
and indexing by one String attribute of the object itself. It took 3
hours and 6 minutes to run this. Crazy!! BTW, the statistics shown by
GOODS after all this process were:
>show
GOODS version 2.91
Local server 0: 'localhost:6100'
Attached clients:
Client "squeak489BD9A", 199 proceed requests, 286601 loaded objects, 0
invalidated
Transaction log size: 1029650
Number of passed checkpoints: 2
Number of active transactions: 0
Last checkpoint completion time 14:29.50 02-APR-2005
Number of synchronous writes: 103
Average number of parallel requests: 1.000
Maximal number of parallel requests: 1
Last global transaction identifier: 0
Memory map size: 8388608, allocated memory 9846528, used memory
9846496, max hole size 32, average hole size 32, number of holes 1
Number of holes of size [32, 63] = 1
Object index size: 1432354816
Allocated space since last GC: 348238
Number of used objects in storage: 264210
Total used space in storage: 5056405
Maximal time of GC completion: 4 (sec)
Total time spent in GC: 11 (sec)
Average time of GC completion: 2.750000 (sec)
Number of completed GC: 4
Profile of classes:
FHKC834Entry 8100 instances, 1312200 total size, 162 average size
TSTreeNode 34586 instances, 1037580 total size, 30 average size
TSTree 1 instances, 6 total size, 6 average size
FHKC834Header 1 instances, 30 total size, 30 average size
BTreeKeysArray 1 instances, 30 total size, 30 average size
BTreeLeafNode 1 instances, 42 total size, 42 average size
BTree 1 instances, 6 total size, 6 average size
String 193883 instances, 2356410 total size, 12 average size
Association 1 instances, 12 total size, 12 average size
??? 27625 instances, 0 total size, 0 average size
Memory usage: nodes=9437814, locks=73761, references=0,
instances=6884360
Storage file size: 8773632
Page pool size: 4096
Page pool access: 109070 reads, 261137 writes, 2404 page faults
Number of classes in storage: 10
I don't know if I'm reading this correctly, but it says 109070 reads
and 261137 writes?? For 8784 objects?? Very strange!
Then, I decided to try the same thing, but instead of using TSTree, I
used BTrees, on the following piece of code:
head := FHKC834Header new.
head controlNumber: (self ediData controlNumber).
head referenceNumber: (self ediData referenceNumber).
head date: (self ediData date).
head time: (self ediData time).
head members: (BTree new).
time := Time millisecondsToRun:
[db := KKDatabase onHost: 'localhost' port: 6100.
db root: Dictionary new.
db commit.
theBase := db root at: 'FHKC834' ifAbsent:
[db root at: 'FHKC834' put: (BTree new).
db commit.
db root at: 'FHKC834'].
theBase at: 0 put: head.
db commit.
db flush.
Transcript cr.
count := 0.
numCommits := 0.
self ediData members do:
[:ea|
(count \\ 100) = 0
ifTrue:
[Transcript show: '.'.
db commit.
numCommits := numCommits + 1].
(head members) at: count put: ea.
count := count + 1].
db commit.
db flushAll.
numCommits := numCommits + 1].
Transcript cr; show: (time/1000) asFloat; show: ' seconds to load into
GOODS in ';
show: numCommits; show: ' commit(s)'.
This code ran a lot faster compared to the previous one. This one took
only 1 hour and 4 minutes and the GOODS console statistics looked like
this:
show
GOODS version 2.91
Local server 0: 'localhost:6100'
Attached clients:
Client "squeak5417431", 198 proceed requests, 286601 loaded objects, 0
invalidated
Transaction log size: 172818
Number of passed checkpoints: 2
Number of active transactions: 0
Last checkpoint completion time 16:26.55 02-APR-2005
Number of synchronous writes: 102
Average number of parallel requests: 1.000
Maximal number of parallel requests: 1
Last global transaction identifier: 0
Memory map size: 8388608, allocated memory 9494528, used memory
9494336, max hole size 192, average hole size 192, number of holes 1
Number of holes of size [128, 255] = 1
Object index size: 1432354816
Allocated space since last GC: 884692
Number of used objects in storage: 225809
Total used space in storage: 4504768
Maximal time of GC completion: 4 (sec)
Total time spent in GC: 8 (sec)
Average time of GC completion: 2.666667 (sec)
Number of completed GC: 3
Profile of classes:
FHKC834Entry 6900 instances, 1117800 total size, 162 average size
BTreeInteriorNode 3421 instances, 143682 total size, 42 average
size
FHKC834Header 1 instances, 30 total size, 30 average size
BTreeKeysArray 6871 instances, 206130 total size, 30 average size
BTreeLeafNode 3450 instances, 144900 total size, 42 average size
BTree 2 instances, 12 total size, 6 average size
String 165203 instances, 2005744 total size, 12 average size
Association 1 instances, 12 total size, 12 average size
??? 39951 instances, 0 total size, 0 average size
Memory usage: nodes=9437814, locks=73761, references=12299,
instances=6884360
Storage file size: 9302016
Page pool size: 4096
Page pool access: 41950 reads, 238675 writes, 2318 page faults
Number of classes in storage: 9
A lot less reads, but similar amount of writes. Compared to Magma, the
performance difference (9 minutes) seems irrelevant. They both run too
slow.
Then, once the data was in GOODS, I ran this other piece of code:
time := Time millisecondsToRun:
[db := KKDatabase onHost: 'localhost' port: 6100.
xmt := (db root at: 'FHKC834') at: 0.
Transcript cr.
count := 0.
numCommits := 0.
xmt members do:
[:ea|
(count \\ 100) = 0
ifTrue:
[Transcript show: '.'.
ea lName: (ea lName asLowercase).
db commit.
numCommits := numCommits + 1].
count := count + 1].
db flushAll].
Transcript cr; show: (time/1000) asFloat; show: ' seconds to update ';
show: numCommits; show: ' record(s) from GOODS'.
As in the Magma case, this code iterates through the objects in the
database and updates every 100 object. Wow! That was a lot better than
the Magma counter part. This one only took 4 minutes 13 seconds to run
and the GOODS console statistics looked like:
GOODS version 2.91
Local server 0: 'localhost:6100'
Attached clients:
Client "squeak597AAA0", 9036 proceed requests, 27570 loaded objects, 0
invalidated
Transaction log size: 20036
Number of passed checkpoints: 6
Number of active transactions: 0
Number of synchronous writes: 89
Average number of parallel requests: 1.000
Maximal number of parallel requests: 1
Last global transaction identifier: 0
Memory map size: 8388608, allocated memory 9494528, used memory
9368256, max hole size 192, average hole size 32, number of holes 3941
Number of holes of size [32, 63] = 3940
Number of holes of size [128, 255] = 1
Object index size: 699050
Allocated space since last GC: 880
Memory usage: nodes=1048646, locks=24587, references=0, instances=663849
Storage file size: 9494528
Page pool size: 4096
Page pool access: 18666 reads, 178 writes, 1988 page faults
Number of classes in storage: 9
Now. What's wrong with all these? Why is it I feel this whole thing is
running so slow. BTW, while doing all of this, these tasks have been
running on the current process of the VM, meaning that I'm not forking
these into lower priority processes. If I was running this on a
production machine with my application running in Seaside, well, the
application would be unresponsive for all that time.
I decided to run the same thing in a different environment. I quickly
wrote a script in Ruby On Rails connected to a MySQL database. The
entire process (parsing, plus bulk loading, plus the scattered
sequential update) took a little under 9 minutes and during this time,
the environment was still receiving HTTP requests and responding to
them.
As I mentioned in a previous post, the more I work in the Squeak
environment the more I want to develop my production applications
there, but, what's going on? What am I doing wrong? I'll gladly post
all the code for anyone to simulate it in their own local machine,
include a dummy data file. Just ask for it and I'll send it.
Does anyone have any suggestion? At one time I thought it was my model
design, but it really couldn't get any simpler than that. I just can't
believe that it really takes this long to run, while blocking the VM at
the same time.
I want to work with an OODB so that's why I'm trying both Magma and
GOODS. I'm not trying OmniBase simply because of the lack of support
for multi-user locks in the linux environment. Is there any other one
worth looking into (inexpensive, of course)?
Please help.
Thanks,
Daniel Salama
More information about the Squeak-dev
mailing list
|