[squeak-dev] The Inbox: Files-cmm.125.mcz

David T. Lewis lewis at mail.msen.com
Thu Jul 4 15:40:03 UTC 2013


On Wed, Jul 03, 2013 at 12:13:36PM -0700, tim Rowledge wrote:
> 
> On 03-07-2013, at 11:13 AM, Chris Muller <asqueaker at gmail.com> wrote:
> 
> > Good discussion thanks Levente.  You know, after more thinking I had
> > trouble understanding what is the characteristic of the FilePlugin
> > primitives that leads to better performance in a BFS?
> 
> Just take a look at the computational gymnastics undertaken in the lowest level of FilePlugin code; both in the image and the actual platform code. For all platforms it is a nightmare of untangling, mangling and re-tangling strange values that bear little relationship to much useful. As an example, look at what happens when  dir_Lookup is used in unix. Eeek!
> 
> I suggest that the platform handling is done at far too low level. Different platforms do things rather differently and rather than the current pattern of squeezing all of them into a straitjacket - where dir_Lookup is an example - we ought to have a higher level fanout. Keeping to unix as an example, consider what happens in a modest sized directory when running FileDirectory>entries (and we should temporarily ignore the ridiculous uses that gets put to); #directoryContentsFor:do: repeatedly calls the primitive that then uses dir_lookup. By the time you get to looking up entry 42 in the directory you would have already done around 860 readdir calls if there wasn't a sneaky optimisation that keeps the last opened dir stream around to see if it is the next needed  one. When doing a BFS, that is generally true, thus some benefit accrues.
> 
> If we had image level code that understood how readdir works and took advantage of it through a suitable prim we might gain some interesting advantages.Likewise for the specific apis on other platforms; RISC OS for example can look directly at entry(i) but also has to do terrible things to handle the idiotic edge case of passing an empty string argument to mean 'list all the roots'. Take a look at dir_LookupRoot in platforms/RiscOS/plugins/FilePlugin/sqRPCDirectory.c and reel in shock!
>

About 10 years ago I did a plugin to provide directory access using
Posix directory streams for Unix and Windows VMs:

  http://wiki.squeak.org/squeak/2274

I wonder if we could generalize the interface in such a way that it would
still make sense for non-posix platforms?

At the time, I was claiming some fairly significant performance improvements.
I don't know what it would look like today, but given that this is an
improvement in the primitives, the relative performance for a Cog VM might
be even better.

Here is what I was seeing on my ancient PC at the time:

  Performance improvement on my system, running Linux:
  
      * FileDirectory>>directoryContentsFor: is 7.3 times faster
      * FileDirectory>>entryAt:ifAbsent: is 148 times faster
      * FileDirectory>>fileAndDirectoryNames is 5.2 times faster
      * FileDirectory>>fileExists: is 121 times faster
      * FileDirectory>>directoryExists: is 104 times faster
      * FileDirectory>>fileOrDirectoryExists: is 32 times faster
      * StandardFileStream>>isAFileNamed: is 1445 times faster 
  
  
  Performance improvement on my system, running Windows:
  
      * FileDirectory>>directoryContentsFor: (N/A)
      * FileDirectory>>entryAt:ifAbsent: (N/A)
      * FileDirectory>>fileAndDirectoryNames 1.37 times faster
      * FileDirectory>>fileExists: 86 times faster
      * FileDirectory>>directoryExists: 80 times faster
      * FileDirectory>>fileOrDirectoryExists: 90 times faster
      * StandardFileStream>>isAFileNamed: 497 times faster 

Dave
 


More information about the Squeak-dev mailing list