Very interesting.<div><br></div><div>Thanks for share it.</div><div><br></div><div>Facu<br><br><div class="gmail_quote">On Wed, Nov 24, 2010 at 2:00 PM, Chris Muller <span dir="ltr"><<a href="mailto:ma.chris.m@gmail.com">ma.chris.m@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">When reading any object off the hard drive (represented as the<br>
'byteArray' of a single MaObjectBuffer), Magma always reads 280 bytes.<br>
Since the #physicalSize is in the object header, it is then able to<br>
check the contents of the buffer to determine the size of the whole<br>
object and, if necessary, read more bytes in order to get the whole<br>
object. See MaObjectFiler>>#read:bytesInto:and:startingAt:filePosition:<br>
for this behavior.<br>
<br>
280 bytes is enough for about 40 pointer references, allowing most<br>
objects to be read in just one disk access. I refer to it as the<br>
#trackSize, to remind me it is supposed to be how many bytes I think<br>
can the HD read in one operation without overrunning its own internal<br>
buffers and becoming inefficient. I was curious whether this number<br>
is optimized in 2010, so I ran the following script:<br>
<br>
-----------<br>
|stats random| stats:=OrderedCollection new. random := Random new.<br>
nextPos:=100.<br>
(FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed:<br>
'objects.2.dat' do:<br>
[ : stream | | ba fileSize | ba := ByteArray new: 10000.<br>
fileSize := stream size.<br>
100 to: 10000 by: 100 do:<br>
[ : n |<br>
stream position: 0.<br>
Transcript cr; show: (stats add: n->([stream<br>
maRead: n "bytes"<br>
bytesFromPosition: 1<br>
of: ba<br>
atFilePosition: (random nextInt: fileSize ] bench)) ]].<br>
stats<br>
------------<br>
<br>
Note that "objects.2.dat" is a real Magma file, 1.8GB in size. The<br>
goal of the script is bench how fast Squeak can read object buffers<br>
off the hard-drive when we obviously won't get many (if any) HD cache<br>
hits.<br>
<br>
I have a cheap, Western Digital Caviar HD, which produced the following output:<br>
<br>
100->'119 per second.'<br>
200->'98.5 per second.'<br>
300->'106 per second.'<br>
400->'106 per second.'<br>
500->'101 per second.'<br>
600->'102 per second.'<br>
700->'99.9 per second.'<br>
800->'103 per second.'<br>
900->'104 per second.'<br>
1000->'99 per second.'<br>
1100->'97.9 per second.'<br>
1200->'104 per second.'<br>
1300->'111 per second.'<br>
1400->'99.8 per second.'<br>
1500->'107 per second.'<br>
1600->'108 per second.'<br>
1700->'95.6 per second.'<br>
1800->'103 per second.'<br>
1900->'108 per second.'<br>
2000->'102 per second.'<br>
2100->'103 per second.'<br>
2200->'107 per second.'<br>
...<br>
3000->'98.7 per second.'<br>
4000->'102 per second.'<br>
5000->'106 per second.'<br>
6000->'104 per second.'<br>
7000->'101 per second.'<br>
8000->'102 per second.'<br>
9000->'102 per second.'<br>
10000->'107 per second.'<br>
<br>
For curiousity, I also modified the script to read very small buffers<br>
from the HD, here are the results:<br>
<br>
4->'137 per second.'<br>
12->'146 per second.'<br>
20->'154 per second.'<br>
28->'143 per second.'<br>
<br>
(The HD busy light was solid ON during the test).<br>
<br>
At first I was puzzled because Magma has demonstrated much faster<br>
objects-per-second read rates than these, even including<br>
materialization, what gives?<br>
<br>
It's the HD buffering. Most of the time, objects are "clustered"<br>
closely together, so that reading one object causes the "next" object<br>
which will be read to already be in the HD's buffer. Here's the same<br>
script, except reading mostly "sequentially" through the file instead<br>
of from a random location:<br>
<br>
|stats random nextPos| stats:=OrderedCollection new. random := Random new.<br>
nextPos:=100.<br>
(FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed:<br>
'objects.2.dat' do:<br>
[ : stream | | ba fileSize | ba := ByteArray new: 10000.<br>
fileSize := stream size.<br>
#(4 12 20 28 100 200 300 400 500)<br>
[ : n |<br>
stream position: 0.<br>
Transcript cr; show: (stats add: n->([stream<br>
maRead: n "bytes"<br>
bytesFromPosition: 1<br>
of: ba<br>
atFilePosition: ("random nextInt: fileSize" (nextPos :=<br>
nextPos+n+10)) ] bench)) ]].<br>
stats<br>
<br>
Now look at the results:<br>
<br>
"Reading sequentially rather than at a random position."<br>
4->'1,160,000 per second.'<br>
12->'1,210,000 per second.'<br>
20->'1,100,000 per second.'<br>
28->'973,000 per second.'<br>
...<br>
100->'1,030,000 per second.'<br>
200->'321,000 per second.'<br>
300->'215,000 per second.'<br>
400->'160,000 per second.'<br>
500->'227,000 per second.'<br>
<br>
Conclusions:<br>
<br>
- Hard-disk seek is definitely a bottleneck with Magma, or any<br>
Squeak application that requires random-access to a file.<br>
- When objects are clustered closely together, read performance can<br>
be dramatically better.<br>
- HD's with fast seek times, such as newer solid-state drives, might<br>
perform dramatically better.<br>
- I should consider reducing the trackSize from 280 bytes to ~100<br>
bytes (or make it customizable); because the rate drops really fast<br>
after that and even a second read required could still be faster than<br>
an initial read.<br>
<br>
- Chris<br>
_______________________________________________<br>
Magma mailing list<br>
<a href="mailto:Magma@lists.squeakfoundation.org">Magma@lists.squeakfoundation.org</a><br>
<a href="http://lists.squeakfoundation.org/mailman/listinfo/magma" target="_blank">http://lists.squeakfoundation.org/mailman/listinfo/magma</a><br>
</blockquote></div><br></div>