Very interesting.<div><br></div><div>Thanks for share it.</div><div><br></div><div>Facu<br><br><div class="gmail_quote">On Wed, Nov 24, 2010 at 2:00 PM, Chris Muller <span dir="ltr">&lt;<a href="mailto:ma.chris.m@gmail.com">ma.chris.m@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">When reading any object off the hard drive (represented as the<br>

&#39;byteArray&#39; of a single MaObjectBuffer), Magma always reads 280 bytes.<br>

 Since the #physicalSize is in the object header, it is then able to<br>

check the contents of the buffer to determine the size of the whole<br>

object and, if necessary, read more bytes in order to get the whole<br>

object.  See MaObjectFiler&gt;&gt;#read:bytesInto:and:startingAt:filePosition:<br>

for this behavior.<br>

<br>

280 bytes is enough for about 40 pointer references, allowing most<br>

objects to be read in just one disk access.  I refer to it as the<br>

#trackSize, to remind me it is supposed to be how many bytes I think<br>

can the HD read in one operation without overrunning its own internal<br>

buffers and becoming inefficient.  I was curious whether this number<br>

is optimized in 2010, so I ran the following script:<br>

<br>

-----------<br>

|stats random| stats:=OrderedCollection new. random := Random new.<br>

nextPos:=100.<br>

(FileDirectory on: &#39;/home/cmm/test3/cube.001.magma&#39;) fileNamed:<br>

&#39;objects.2.dat&#39; do:<br>

        [ : stream | | ba fileSize | ba := ByteArray new: 10000.<br>

        fileSize := stream size.<br>

        100 to: 10000 by: 100 do:<br>

                [ : n |<br>

                stream position: 0.<br>

                Transcript cr; show: (stats add: n-&gt;([stream<br>

                                maRead: n &quot;bytes&quot;<br>

                                bytesFromPosition: 1<br>

                                of: ba<br>

                                atFilePosition: (random nextInt: fileSize ] bench)) ]].<br>

stats<br>

------------<br>

<br>

Note that &quot;objects.2.dat&quot; is a real Magma file, 1.8GB in size.  The<br>

goal of the script is bench how fast Squeak can read object buffers<br>

off the hard-drive when we obviously won&#39;t get many (if any) HD cache<br>

hits.<br>

<br>

I have a cheap, Western Digital Caviar HD, which produced the following output:<br>

<br>

100-&gt;&#39;119 per second.&#39;<br>

200-&gt;&#39;98.5 per second.&#39;<br>

300-&gt;&#39;106 per second.&#39;<br>

400-&gt;&#39;106 per second.&#39;<br>

500-&gt;&#39;101 per second.&#39;<br>

600-&gt;&#39;102 per second.&#39;<br>

700-&gt;&#39;99.9 per second.&#39;<br>

800-&gt;&#39;103 per second.&#39;<br>

900-&gt;&#39;104 per second.&#39;<br>

1000-&gt;&#39;99 per second.&#39;<br>

1100-&gt;&#39;97.9 per second.&#39;<br>

1200-&gt;&#39;104 per second.&#39;<br>

1300-&gt;&#39;111 per second.&#39;<br>

1400-&gt;&#39;99.8 per second.&#39;<br>

1500-&gt;&#39;107 per second.&#39;<br>

1600-&gt;&#39;108 per second.&#39;<br>

1700-&gt;&#39;95.6 per second.&#39;<br>

1800-&gt;&#39;103 per second.&#39;<br>

1900-&gt;&#39;108 per second.&#39;<br>

2000-&gt;&#39;102 per second.&#39;<br>

2100-&gt;&#39;103 per second.&#39;<br>

2200-&gt;&#39;107 per second.&#39;<br>

...<br>

3000-&gt;&#39;98.7 per second.&#39;<br>

4000-&gt;&#39;102 per second.&#39;<br>

5000-&gt;&#39;106 per second.&#39;<br>

6000-&gt;&#39;104 per second.&#39;<br>

7000-&gt;&#39;101 per second.&#39;<br>

8000-&gt;&#39;102 per second.&#39;<br>

9000-&gt;&#39;102 per second.&#39;<br>

10000-&gt;&#39;107 per second.&#39;<br>

<br>

For curiousity, I also modified the script to read very small buffers<br>

from the HD, here are the results:<br>

<br>

4-&gt;&#39;137 per second.&#39;<br>

12-&gt;&#39;146 per second.&#39;<br>

20-&gt;&#39;154 per second.&#39;<br>

28-&gt;&#39;143 per second.&#39;<br>

<br>

(The HD busy light was solid ON during the test).<br>

<br>

At first I was puzzled because Magma has demonstrated much faster<br>

objects-per-second read rates than these, even including<br>

materialization, what gives?<br>

<br>

It&#39;s the HD buffering.  Most of the time, objects are &quot;clustered&quot;<br>

closely together, so that reading one object causes the &quot;next&quot; object<br>

which will be read to already be in the HD&#39;s buffer.  Here&#39;s the same<br>

script, except reading mostly &quot;sequentially&quot; through the file instead<br>

of from a random location:<br>

<br>

|stats random nextPos| stats:=OrderedCollection new. random := Random new.<br>

nextPos:=100.<br>

(FileDirectory on: &#39;/home/cmm/test3/cube.001.magma&#39;) fileNamed:<br>

&#39;objects.2.dat&#39; do:<br>

        [ : stream | | ba fileSize | ba := ByteArray new: 10000.<br>

        fileSize := stream size.<br>

        #(4 12 20 28 100 200 300 400 500)<br>

                [ : n |<br>

                stream position: 0.<br>

                Transcript cr; show: (stats add: n-&gt;([stream<br>

                                maRead: n &quot;bytes&quot;<br>

                                bytesFromPosition: 1<br>

                                of: ba<br>

                                atFilePosition: (&quot;random nextInt: fileSize&quot; (nextPos :=<br>

nextPos+n+10)) ] bench)) ]].<br>

stats<br>

<br>

Now look at the results:<br>

<br>

&quot;Reading sequentially rather than at a random position.&quot;<br>

4-&gt;&#39;1,160,000 per second.&#39;<br>

12-&gt;&#39;1,210,000 per second.&#39;<br>

20-&gt;&#39;1,100,000 per second.&#39;<br>

28-&gt;&#39;973,000 per second.&#39;<br>

...<br>

100-&gt;&#39;1,030,000 per second.&#39;<br>

200-&gt;&#39;321,000 per second.&#39;<br>

300-&gt;&#39;215,000 per second.&#39;<br>

400-&gt;&#39;160,000 per second.&#39;<br>

500-&gt;&#39;227,000 per second.&#39;<br>

<br>

Conclusions:<br>

<br>

  - Hard-disk seek is definitely a bottleneck with Magma, or any<br>

Squeak application that requires random-access to a file.<br>

  - When objects are clustered closely together, read performance can<br>

be dramatically better.<br>

  - HD&#39;s with fast seek times, such as newer solid-state drives, might<br>

perform dramatically better.<br>

  - I should consider reducing the trackSize from 280 bytes to ~100<br>

bytes (or make it customizable); because the rate drops really fast<br>

after that and even a second read required could still be faster than<br>

an initial read.<br>

<br>

 - Chris<br>

_______________________________________________<br>

Magma mailing list<br>

<a href="mailto:Magma@lists.squeakfoundation.org">Magma@lists.squeakfoundation.org</a><br>

<a href="http://lists.squeakfoundation.org/mailman/listinfo/magma" target="_blank">http://lists.squeakfoundation.org/mailman/listinfo/magma</a><br>

</blockquote></div><br></div>