hard-drive read-performance
Facundo Vozzi
facundov79 at gmail.com
Wed Nov 24 18:00:03 UTC 2010
Very interesting.
Thanks for share it.
Facu
On Wed, Nov 24, 2010 at 2:00 PM, Chris Muller <ma.chris.m at gmail.com> wrote:
> When reading any object off the hard drive (represented as the
> 'byteArray' of a single MaObjectBuffer), Magma always reads 280 bytes.
> Since the #physicalSize is in the object header, it is then able to
> check the contents of the buffer to determine the size of the whole
> object and, if necessary, read more bytes in order to get the whole
> object. See MaObjectFiler>>#read:bytesInto:and:startingAt:filePosition:
> for this behavior.
>
> 280 bytes is enough for about 40 pointer references, allowing most
> objects to be read in just one disk access. I refer to it as the
> #trackSize, to remind me it is supposed to be how many bytes I think
> can the HD read in one operation without overrunning its own internal
> buffers and becoming inefficient. I was curious whether this number
> is optimized in 2010, so I ran the following script:
>
> -----------
> |stats random| stats:=OrderedCollection new. random := Random new.
> nextPos:=100.
> (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed:
> 'objects.2.dat' do:
> [ : stream | | ba fileSize | ba := ByteArray new: 10000.
> fileSize := stream size.
> 100 to: 10000 by: 100 do:
> [ : n |
> stream position: 0.
> Transcript cr; show: (stats add: n->([stream
> maRead: n "bytes"
> bytesFromPosition: 1
> of: ba
> atFilePosition: (random nextInt: fileSize ]
> bench)) ]].
> stats
> ------------
>
> Note that "objects.2.dat" is a real Magma file, 1.8GB in size. The
> goal of the script is bench how fast Squeak can read object buffers
> off the hard-drive when we obviously won't get many (if any) HD cache
> hits.
>
> I have a cheap, Western Digital Caviar HD, which produced the following
> output:
>
> 100->'119 per second.'
> 200->'98.5 per second.'
> 300->'106 per second.'
> 400->'106 per second.'
> 500->'101 per second.'
> 600->'102 per second.'
> 700->'99.9 per second.'
> 800->'103 per second.'
> 900->'104 per second.'
> 1000->'99 per second.'
> 1100->'97.9 per second.'
> 1200->'104 per second.'
> 1300->'111 per second.'
> 1400->'99.8 per second.'
> 1500->'107 per second.'
> 1600->'108 per second.'
> 1700->'95.6 per second.'
> 1800->'103 per second.'
> 1900->'108 per second.'
> 2000->'102 per second.'
> 2100->'103 per second.'
> 2200->'107 per second.'
> ...
> 3000->'98.7 per second.'
> 4000->'102 per second.'
> 5000->'106 per second.'
> 6000->'104 per second.'
> 7000->'101 per second.'
> 8000->'102 per second.'
> 9000->'102 per second.'
> 10000->'107 per second.'
>
> For curiousity, I also modified the script to read very small buffers
> from the HD, here are the results:
>
> 4->'137 per second.'
> 12->'146 per second.'
> 20->'154 per second.'
> 28->'143 per second.'
>
> (The HD busy light was solid ON during the test).
>
> At first I was puzzled because Magma has demonstrated much faster
> objects-per-second read rates than these, even including
> materialization, what gives?
>
> It's the HD buffering. Most of the time, objects are "clustered"
> closely together, so that reading one object causes the "next" object
> which will be read to already be in the HD's buffer. Here's the same
> script, except reading mostly "sequentially" through the file instead
> of from a random location:
>
> |stats random nextPos| stats:=OrderedCollection new. random := Random new.
> nextPos:=100.
> (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed:
> 'objects.2.dat' do:
> [ : stream | | ba fileSize | ba := ByteArray new: 10000.
> fileSize := stream size.
> #(4 12 20 28 100 200 300 400 500)
> [ : n |
> stream position: 0.
> Transcript cr; show: (stats add: n->([stream
> maRead: n "bytes"
> bytesFromPosition: 1
> of: ba
> atFilePosition: ("random nextInt: fileSize"
> (nextPos :=
> nextPos+n+10)) ] bench)) ]].
> stats
>
> Now look at the results:
>
> "Reading sequentially rather than at a random position."
> 4->'1,160,000 per second.'
> 12->'1,210,000 per second.'
> 20->'1,100,000 per second.'
> 28->'973,000 per second.'
> ...
> 100->'1,030,000 per second.'
> 200->'321,000 per second.'
> 300->'215,000 per second.'
> 400->'160,000 per second.'
> 500->'227,000 per second.'
>
> Conclusions:
>
> - Hard-disk seek is definitely a bottleneck with Magma, or any
> Squeak application that requires random-access to a file.
> - When objects are clustered closely together, read performance can
> be dramatically better.
> - HD's with fast seek times, such as newer solid-state drives, might
> perform dramatically better.
> - I should consider reducing the trackSize from 280 bytes to ~100
> bytes (or make it customizable); because the rate drops really fast
> after that and even a second read required could still be faster than
> an initial read.
>
> - Chris
> _______________________________________________
> Magma mailing list
> Magma at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/magma/attachments/20101124/2837af0f/attachment.htm
More information about the Magma
mailing list