[Newbies] Re: Binary file I/O performance problems

nicolas cellier ncellier at ifrance.com
Fri Sep 5 21:00:07 UTC 2008


Yoshiki Ohshima a écrit :
> At Fri, 5 Sep 2008 10:59:03 -0700,
> David Finlayson wrote:
>> I re-wrote the test application to load the test file entirely into
>> memory before parsing the data. The total time to parse the file
>> decreased by about 50%. Now that I/O is removed from the picture, the
>> new bottle neck is turning bytes into integers (and then integers into
>> Floats).
>>
>> I know that Smalltalk isn't the common language for number crunching,
>> but if I can get acceptable performance out of it, then down the road
>> I would like to tap into the Croquet environment. That is why I am
>> trying to learn a way that will work.
> 
>   If the integers or floats are in the layout of C's int[] or float[],
> there is a better chance to make it much faster.
> 
>   Look at the method Bitmap>>asByteArray and
> Bitmap>>copyFromByteArray:.  You can convert a big array of non-pointer
> words from/to a byte array.
> 
>   data := (1 to: 1000000) as: FloatArray.
>   words := Bitmap new: data size.
>   words replaceFrom: 1 to: data size with: data.
>   bytes := words asByteArray.
> 
>   "and you write out the bytes into a binary file."
> 
>   "to get them back:"
> 
>   words copyFromByteArray: bytes.
>   data replaceFrom: 1 to: words size with: words.
> 
> Obviously, you can recycle some of the intermediate buffer allocation
> and that would speed it up.
> 
>   FloatArray has some vector arithmetic primitives, and the Kedama
> system in OLPC Etoys image have more elaborated vector arithmetic
> primitives on integers and floats including operations with masked
> vectors.
> 
> -- Yoshiki

Hi David,
your applications is exciting my curiosity. Which company/organization 
are you working for, if not indiscreet?

I think you will solve most performances problems following good advices 
from Yoshiki.

You might also want to investigate FFI as a way for handling 
C-layout-like ByteArray memory from within Smalltalk as an alternative.
I made an example of use in Smallapack-Collections (search Smallapack in 
squeaksource, http://www.squeaksource.com/Smallapack/) .
ExternalArray is an abstract class for handling memory filled as a 
C-Arrays of any type from within Smalltalk (only float double and 
complex are programmed in subclasses, but you can extend), and in fact 
FFI can handle any structure (though you'll might have to resolve 
alignment problems by yourself).
There's a trade-off between fast reading (no conversion) and slower 
access (conversion at each access), however with ByteArray>>#doubleAt: 
and #floatAt: primitives (from FFI), and fast hacks to eventually 
reverse endianness of a whole array at once, maintaining ExternalArrays 
of elementary types or small structures procide access time still 
reasonnable.

Nicolas



More information about the Beginners mailing list