hard-drive read-performance

Wed Nov 24 18:00:03 UTC 2010

Very interesting.

Thanks for share it.

Facu

On Wed, Nov 24, 2010 at 2:00 PM, Chris Muller <ma.chris.m at gmail.com> wrote:

> When reading any object off the hard drive (represented as the
> 'byteArray' of a single MaObjectBuffer), Magma always reads 280 bytes.
>  Since the #physicalSize is in the object header, it is then able to
> check the contents of the buffer to determine the size of the whole
> object and, if necessary, read more bytes in order to get the whole
> object.  See MaObjectFiler>>#read:bytesInto:and:startingAt:filePosition:
> for this behavior.
>
> 280 bytes is enough for about 40 pointer references, allowing most
> objects to be read in just one disk access.  I refer to it as the
> #trackSize, to remind me it is supposed to be how many bytes I think
> can the HD read in one operation without overrunning its own internal
> buffers and becoming inefficient.  I was curious whether this number
> is optimized in 2010, so I ran the following script:
>
> -----------
> |stats random| stats:=OrderedCollection new. random := Random new.
> nextPos:=100.
> (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed:
> 'objects.2.dat' do:
>        [ : stream | | ba fileSize | ba := ByteArray new: 10000.
>        fileSize := stream size.
>        100 to: 10000 by: 100 do:
>                [ : n |
>                stream position: 0.
>                Transcript cr; show: (stats add: n->([stream
>                                maRead: n "bytes"
>                                bytesFromPosition: 1
>                                of: ba
>                                atFilePosition: (random nextInt: fileSize ]
> bench)) ]].
> stats
> ------------
>
> Note that "objects.2.dat" is a real Magma file, 1.8GB in size.  The
> goal of the script is bench how fast Squeak can read object buffers
> off the hard-drive when we obviously won't get many (if any) HD cache
> hits.
>
> I have a cheap, Western Digital Caviar HD, which produced the following
> output:
>
> 100->'119 per second.'
> 200->'98.5 per second.'
> 300->'106 per second.'
> 400->'106 per second.'
> 500->'101 per second.'
> 600->'102 per second.'
> 700->'99.9 per second.'
> 800->'103 per second.'
> 900->'104 per second.'
> 1000->'99 per second.'
> 1100->'97.9 per second.'
> 1200->'104 per second.'
> 1300->'111 per second.'
> 1400->'99.8 per second.'
> 1500->'107 per second.'
> 1600->'108 per second.'
> 1700->'95.6 per second.'
> 1800->'103 per second.'
> 1900->'108 per second.'
> 2000->'102 per second.'
> 2100->'103 per second.'
> 2200->'107 per second.'
> ...
> 3000->'98.7 per second.'
> 4000->'102 per second.'
> 5000->'106 per second.'
> 6000->'104 per second.'
> 7000->'101 per second.'
> 8000->'102 per second.'
> 9000->'102 per second.'
> 10000->'107 per second.'
>
> For curiousity, I also modified the script to read very small buffers
> from the HD, here are the results:
>
> 4->'137 per second.'
> 12->'146 per second.'
> 20->'154 per second.'
> 28->'143 per second.'
>
> (The HD busy light was solid ON during the test).
>
> At first I was puzzled because Magma has demonstrated much faster
> objects-per-second read rates than these, even including
> materialization, what gives?
>
> It's the HD buffering.  Most of the time, objects are "clustered"
> closely together, so that reading one object causes the "next" object
> which will be read to already be in the HD's buffer.  Here's the same
> script, except reading mostly "sequentially" through the file instead
> of from a random location:
>
> |stats random nextPos| stats:=OrderedCollection new. random := Random new.
> nextPos:=100.
> (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed:
> 'objects.2.dat' do:
>        [ : stream | | ba fileSize | ba := ByteArray new: 10000.
>        fileSize := stream size.
>        #(4 12 20 28 100 200 300 400 500)
>                [ : n |
>                stream position: 0.
>                Transcript cr; show: (stats add: n->([stream
>                                maRead: n "bytes"
>                                bytesFromPosition: 1
>                                of: ba
>                                atFilePosition: ("random nextInt: fileSize"
> (nextPos :=
> nextPos+n+10)) ] bench)) ]].
> stats
>
> Now look at the results:
>
> "Reading sequentially rather than at a random position."
> 4->'1,160,000 per second.'
> 12->'1,210,000 per second.'
> 20->'1,100,000 per second.'
> 28->'973,000 per second.'
> ...
> 100->'1,030,000 per second.'
> 200->'321,000 per second.'
> 300->'215,000 per second.'
> 400->'160,000 per second.'
> 500->'227,000 per second.'
>
> Conclusions:
>
>  - Hard-disk seek is definitely a bottleneck with Magma, or any
> Squeak application that requires random-access to a file.
>  - When objects are clustered closely together, read performance can
> be dramatically better.
>  - HD's with fast seek times, such as newer solid-state drives, might
> perform dramatically better.
>  - I should consider reducing the trackSize from 280 bytes to ~100
> bytes (or make it customizable); because the rate drops really fast
> after that and even a second read required could still be faster than
> an initial read.
>
>  - Chris
> _______________________________________________
> Magma mailing list
> Magma at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/magma/attachments/20101124/2837af0f/attachment.htm