My first Magma experience (Now including GOODS)

Daniel Salama dsalama at user.net
Sat Apr 2 22:15:14 UTC 2005


On Apr 1, 2005, at 11:05 AM, Chris Muller wrote:

>> I'll install the newer version of Magma. Also, this is the code of
>> FHKC834Header>>addMember
>>
>> addMember: anObject
>> 	self members
>> 		ifNotNil: [self members addLast: anObject].
>>
>> I'm new to Smalltalk, but my understanding is that this is simply
>> adding an object to the end of the linked list. Please confirm.
>
> Yes, and anObject is a subclass of Link, right?  So the question is, 
> is its
> "nextLink" already populated.  I'm thinking it must be, and *its* 
> nextLink and
> the next and the next..
>
> That's why the commit is picking up so many objects in your first 
> commit; *I
> think*.
>
>  - Chris

Chris,

Yes, anObject is subclass of Link. What I was thinking is that maybe 
because Magma stores information based on reachability, if I'm trying 
to store objects of a LinkedList, since every object is linked to one 
another, it may be fetching/storing all objects in each commit, or 
something like that. I don't know, I'm just guessing why it's "running 
out" of memory.

Anyway, as I mentioned to you, I changed the collection type to 
MagmaCollection instead of LinkedList. Once I made the change, the 
process ran successfully. However, the performance was somewhat 
disappointing. I'm not blaming Magma or Squeak. It could simply be that 
because I'm new to Smalltalk and all this environment in general, my 
design could be flawed.

To recap, all I'm trying to do is store an object of class 
FHKC834Header as an element of an OrderedCollection in the root of a 
Magma database. This object contains 5 attributes. Four of them simply 
store short strings (less than 20 chars) and the fifth one stores a 
MagmaCollection of 8784 FHKC834Entry objects. This FHKC834Entry class 
simply stores 27 string attributes, each no longer than 30 chars.

The portion of the code that loads the data into Magma is:

	head := FHKC834Header new.
	head controlNumber: (self ediData controlNumber).
	head referenceNumber: (self ediData referenceNumber).
	head date: (self ediData date).
	head time: (self ediData time).
	head members: (MagmaCollection new).

	time := Time millisecondsToRun:
		[mySession := MagmaSession hostAddress: #(127 0 0 1) asByteArray 
port: 51969.
		 mySession connectAs: 'dsalama'.
		 mySession commit:
			[mySession root removeKey: 'FHKC834' ifAbsent: []].
		
		 theBase := mySession root at: 'FHKC834' ifAbsent:
			[theBase := OrderedCollection new.
			 mySession commit: [mySession root at: 'FHKC834' put: theBase]].
		
		 mySession commit:
			[theBase add: head].

		 Transcript cr.
		 count := 0.
		 numCommits := 0.
		 mySession begin.
		 self ediData members do:
			[:ea|
			 (count \\ 100) = 0
				ifTrue:
					[mySession commitAndBegin.
					 Transcript show: '.'.
					 numCommits := numCommits + 1].
			 head addMember: ea.
			 count := count + 1].
		 mySession commit.
		 numCommits := numCommits + 1.
		
		 mySession disconnect].

	Transcript cr; show: (time/1000) asFloat; show: ' seconds to load into 
Magma in ';
		show: numCommits; show: ' commit(s)'.

where ediData is an object of class FHKC834Header which contains all of 
the information in memory. In other words, this variable stores the 
result of parsing the text file which contains the 8784 records.

As you can see, all I'm doing is adding one FHKC834Header object and 
then looping through all 8784 records, committing every 100 records. It 
took 1 hour 13 minutes to run in a Powerbook G5 1.5Ghz. Needless to 
say, it was very slow. The results from the MagmaConsole statistics at 
the end of the run were:

a MagmaServerStatistics
	sockets queued:
	min = 1
	max = 1
	sum = 102
	count = 102
	mean : 1.0
	requests queued:
	min = 1
	max = 1
	sum = 102
	count = 102
	mean : 1.0
	request sizes:
	min = 35
	max = 150440
	sum = 13142912
	count = 102
	mean : 128852.0784313725
	request processing time:
	min = a MaDuration 00:00:00.000
	max = a MaDuration 00:00:03.737
	sum = a MaDuration 00:01:59.995
	count = 101
	mean : **invalid**
	response sizes:
	min = 4
	max = 303413
	sum = 26564055
	count = 102
	mean : 260431.9117647059

Then, I ran the following code snippet:

	time := Time millisecondsToRun:
		[mySession := MagmaSession hostAddress: #(127 0 0 1) asByteArray 
port: 51969.
		 mySession connectAs: 'dsalama'.
		
		 xmt := (mySession root at: 'FHKC834') first.
		
		 numCommits := 0.
		 count := 0.
		 xmt members do:
			[:ea|
			 (count \\ 100) = 0
				ifTrue:
					[mySession commit:
						[ea lName: (ea lName asLowercase).
						 numCommits := numCommits + 1]].
			 count := count + 1].
		
		 mySession disconnect].

	Transcript cr; show: (time/1000) asFloat; show: ' seconds to update ';
		show: numCommits; show: ' record(s) from Magma'.

What I tried to do here was simply to iterate through all the 
FHKC834Entry objects already stored in the database and updated and 
committed one attribute every 100th object. It took 29 minutes on the 
same computer. This is even worse timing as the other one. I guess I 
could live with a slow bulk load process, but random updates to a large 
collection shouldn't take this long. Again, the results of the 
MagmaConsole after this was run were:

a MagmaServerStatistics
	sockets queued:
	min = 1
	max = 1
	sum = 356
	count = 356
	mean : 1.0
	requests queued:
	min = 1
	max = 1
	sum = 356
	count = 356
	mean : 1.0
	request sizes:
	min = 35
	max = 1156
	sum = 118494
	count = 356
	mean : 332.8483146067416
	request processing time:
	min = a MaDuration 00:00:00.000
	max = a MaDuration 00:00:00.450
	sum = a MaDuration 00:00:17.685
	count = 355
	mean : **invalid**
	response sizes:
	min = 4
	max = 88500
	sum = 2769020
	count = 356
	mean : 7778.14606741573

Am I doing something wrong or is it normal to expect such results (in 
terms of performance)?

Then, I decided to run the tests against GOODS. Well, the results were 
not much better. In the case of GOODS, I used the advise of Avi and 
used BTrees and TSTrees for the collections. I assume this would have 
or should have increased the performance over the standard collections. 
The first code snippet I ran was:

	head := FHKC834Header new.
	head controlNumber: (self ediData controlNumber).
	head referenceNumber: (self ediData referenceNumber).
	head date: (self ediData date).
	head time: (self ediData time).
	head members: (TSTree new).

	time := Time millisecondsToRun:
		[db := KKDatabase onHost: 'localhost' port: 6100.
		 db root: Dictionary new.
		 db commit.
		
		 theBase := db root at: 'FHKC834' ifAbsent:
			[db root at: 'FHKC834' put: (BTree new).
			 db commit.
			 db root at: 'FHKC834'].
		
		 theBase at: 0 put: head.
		 db commit.
		 db flush.

		 Transcript cr.
		 count := 0.
		 numCommits := 0.
		 self ediData members do:
			[:ea|
			 (count \\ 100) = 0
				ifTrue:
					[Transcript show: '.'.
					 db commit.
					 numCommits := numCommits + 1].
			 (head members) at: (ea alternateId1) put: ea.
			 count := count + 1].
		 db commit.
		 db flushAll.
		 numCommits := numCommits + 1].

	Transcript cr; show: (time/1000) asFloat; show: ' seconds to load into 
GOODS in ';
		show: numCommits; show: ' commit(s)'.

Again, similar to the Magma example, this piece of code is supposed to 
insert the 8784 FHKC834Entry objects committing every 100th records, 
and indexing by one String attribute of the object itself. It took 3 
hours and 6 minutes to run this. Crazy!! BTW, the statistics shown by 
GOODS after all this process were:

 >show
GOODS version 2.91
Local server 0: 'localhost:6100'
Attached clients:
Client "squeak489BD9A", 199 proceed requests, 286601 loaded objects, 0 
invalidated
Transaction log size: 1029650
Number of passed checkpoints: 2
Number of active transactions: 0
Last checkpoint completion time 14:29.50 02-APR-2005
Number of synchronous writes: 103
Average number of parallel requests:  1.000
Maximal number of parallel requests: 1
Last global transaction identifier: 0
Memory map size: 8388608, allocated memory 9846528, used memory 
9846496, max hole size 32, average hole size 32, number of holes 1
Number of holes of size [32, 63] = 1
Object index size: 1432354816
Allocated space since last GC: 348238
Number of used objects in storage: 264210
Total used space in storage: 5056405
Maximal time of GC completion: 4 (sec)
Total time spent in GC: 11 (sec)
Average time of GC completion: 2.750000 (sec)
Number of completed GC: 4
Profile of classes:
FHKC834Entry    8100 instances, 1312200 total size, 162 average size
TSTreeNode      34586 instances, 1037580 total size, 30 average size
TSTree  1 instances, 6 total size, 6 average size
FHKC834Header   1 instances, 30 total size, 30 average size
BTreeKeysArray  1 instances, 30 total size, 30 average size
BTreeLeafNode   1 instances, 42 total size, 42 average size
BTree   1 instances, 6 total size, 6 average size
String  193883 instances, 2356410 total size, 12 average size
Association     1 instances, 12 total size, 12 average size
???     27625 instances, 0 total size, 0 average size
Memory usage: nodes=9437814, locks=73761, references=0, 
instances=6884360
Storage file size: 8773632
Page pool size: 4096
Page pool access: 109070 reads, 261137 writes, 2404 page faults
Number of classes in storage: 10

I don't know if I'm reading this correctly, but it says 109070 reads 
and 261137 writes?? For 8784 objects?? Very strange!

Then, I decided to try the same thing, but instead of using TSTree, I 
used BTrees, on the following piece of code:

	head := FHKC834Header new.
	head controlNumber: (self ediData controlNumber).
	head referenceNumber: (self ediData referenceNumber).
	head date: (self ediData date).
	head time: (self ediData time).
	head members: (BTree new).

	time := Time millisecondsToRun:
		[db := KKDatabase onHost: 'localhost' port: 6100.
		 db root: Dictionary new.
		 db commit.
		
		 theBase := db root at: 'FHKC834' ifAbsent:
			[db root at: 'FHKC834' put: (BTree new).
			 db commit.
			 db root at: 'FHKC834'].
		
		 theBase at: 0 put: head.
		 db commit.
		 db flush.

		 Transcript cr.
		 count := 0.
		 numCommits := 0.
		 self ediData members do:
			[:ea|
			 (count \\ 100) = 0
				ifTrue:
					[Transcript show: '.'.
					 db commit.
					 numCommits := numCommits + 1].
			 (head members) at: count put: ea.
			 count := count + 1].
		 db commit.
		 db flushAll.
		 numCommits := numCommits + 1].

	Transcript cr; show: (time/1000) asFloat; show: ' seconds to load into 
GOODS in ';
		show: numCommits; show: ' commit(s)'.

This code ran a lot faster compared to the previous one. This one took 
only 1 hour and 4 minutes and the GOODS console statistics looked like 
this:

show
GOODS version 2.91
Local server 0: 'localhost:6100'
Attached clients:
Client "squeak5417431", 198 proceed requests, 286601 loaded objects, 0 
invalidated
Transaction log size: 172818
Number of passed checkpoints: 2
Number of active transactions: 0
Last checkpoint completion time 16:26.55 02-APR-2005
Number of synchronous writes: 102
Average number of parallel requests:  1.000
Maximal number of parallel requests: 1
Last global transaction identifier: 0
Memory map size: 8388608, allocated memory 9494528, used memory 
9494336, max hole size 192, average hole size 192, number of holes 1
Number of holes of size [128, 255] = 1
Object index size: 1432354816
Allocated space since last GC: 884692
Number of used objects in storage: 225809
Total used space in storage: 4504768
Maximal time of GC completion: 4 (sec)
Total time spent in GC: 8 (sec)
Average time of GC completion: 2.666667 (sec)
Number of completed GC: 3
Profile of classes:
FHKC834Entry    6900 instances, 1117800 total size, 162 average size
BTreeInteriorNode       3421 instances, 143682 total size, 42 average 
size
FHKC834Header   1 instances, 30 total size, 30 average size
BTreeKeysArray  6871 instances, 206130 total size, 30 average size
BTreeLeafNode   3450 instances, 144900 total size, 42 average size
BTree   2 instances, 12 total size, 6 average size
String  165203 instances, 2005744 total size, 12 average size
Association     1 instances, 12 total size, 12 average size
???     39951 instances, 0 total size, 0 average size
Memory usage: nodes=9437814, locks=73761, references=12299, 
instances=6884360
Storage file size: 9302016
Page pool size: 4096
Page pool access: 41950 reads, 238675 writes, 2318 page faults
Number of classes in storage: 9

A lot less reads, but similar amount of writes. Compared to Magma, the 
performance difference (9 minutes) seems irrelevant. They both run too 
slow.

Then, once the data was in GOODS, I ran this other piece of code:

	time := Time millisecondsToRun:
		[db := KKDatabase onHost: 'localhost' port: 6100.
		
		 xmt := (db root at: 'FHKC834') at: 0.

		 Transcript cr.
		 count := 0.
		 numCommits := 0.
		 xmt members do:
			[:ea|
			 (count \\ 100) = 0
				ifTrue:
					[Transcript show: '.'.
					 ea lName: (ea lName asLowercase).
					 db commit.
					 numCommits := numCommits + 1].
			 count := count + 1].
		 db flushAll].
		
	Transcript cr; show: (time/1000) asFloat; show: ' seconds to update ';
		show: numCommits; show: ' record(s) from GOODS'.

As in the Magma case, this code iterates through the objects in the 
database and updates every 100 object. Wow! That was a lot better than 
the Magma counter part. This one only took 4 minutes 13 seconds to run 
and the GOODS console statistics looked like:

GOODS version 2.91
Local server 0: 'localhost:6100'
Attached clients:
Client "squeak597AAA0", 9036 proceed requests, 27570 loaded objects, 0 
invalidated
Transaction log size: 20036
Number of passed checkpoints: 6
Number of active transactions: 0
Number of synchronous writes: 89
Average number of parallel requests:  1.000
Maximal number of parallel requests: 1
Last global transaction identifier: 0
Memory map size: 8388608, allocated memory 9494528, used memory 
9368256, max hole size 192, average hole size 32, number of holes 3941
Number of holes of size [32, 63] = 3940
Number of holes of size [128, 255] = 1
Object index size: 699050
Allocated space since last GC: 880
Memory usage: nodes=1048646, locks=24587, references=0, instances=663849
Storage file size: 9494528
Page pool size: 4096
Page pool access: 18666 reads, 178 writes, 1988 page faults
Number of classes in storage: 9

Now. What's wrong with all these? Why is it I feel this whole thing is 
running so slow. BTW, while doing all of this, these tasks have been 
running on the current process of the VM, meaning that I'm not forking 
these into lower priority processes. If I was running this on a 
production machine with my application running in Seaside, well, the 
application would be unresponsive for all that time.

I decided to run the same thing in a different environment. I quickly 
wrote a script in Ruby On Rails connected to a MySQL database. The 
entire process (parsing, plus bulk loading, plus the scattered 
sequential update) took a little under 9 minutes and during this time, 
the environment was still receiving HTTP requests and responding to 
them.

As I mentioned in a previous post, the more I work in the Squeak 
environment the more I want to develop my production applications 
there, but, what's going on? What am I doing wrong? I'll gladly post 
all the code for anyone to simulate it in their own local machine, 
include a dummy data file. Just ask for it and I'll send it.

Does anyone have any suggestion? At one time I thought it was my model 
design, but it really couldn't get any simpler than that. I just can't 
believe that it really takes this long to run, while blocking the VM at 
the same time.

I want to work with an OODB so that's why I'm trying both Magma and 
GOODS. I'm not trying OmniBase simply because of the lack of support 
for multi-user locks in the linux environment. Is there any other one 
worth looking into (inexpensive, of course)?

Please help.

Thanks,
Daniel Salama




More information about the Squeak-dev mailing list