More File Performance Q.?

Jimmie Houchin jhouchin at texoma.net
Thu May 16 05:20:22 UTC 2002


Thanks for all the help on the original question.

I am still doing something similar but actually doing something other
than reading and writing back to another file.

I worked through this one today until I learned enough Squeak to get it
to work.

However, once again the Python outperforms the Squeak version. It
appears the CrLfFileStream imposes sufficient overhead to drop the
performance. I originally had my code with a StandardFileStream as was
in the previous discussion. It outperformed the Python but did not do
what needed to be done.

The inspiration for these little programs is due to the way Netscape,
etc. mail programs handle MailMan archives or mailboxes. Some mailing
lists provide downloadable archives, Squeak does. In early history (I
haven't found a current one) MailMan's archives could offer problems to
email clients. If in a message 'from' is at the beginning of the line
the client would (does) interpret that as a header. This causes some
unusual messages, with many being truncated and the later portion
becoming it's own message.

This program opens the file and reads each line to determine if any line
beginning with 'from' is actually a header or in the body. If in the
body I insert a space at the beginning of the line.

Due to the requirement of reading each line to operate on it I had to
change from StandardFileStream to CrLfFileStream.

Is there anything I am doing wrong in my code which is causing problems?

Below is the Python and the Squeak versions.

Any help greatly appreciated.


On WindowsME (PII 266) Python: 46 seconds  Squeak: 247 seconds.
On Linux (Athlon 700)  Python: 18 seconds  Squeak: 63 seconds.

Python:
FixMB.py

import os, string, time

def fixmb(maildir):
    btime = time.time()
    filelist = os.listdir(maildir)
    files = []
    
    for f in filelist:
        files.append(maildir + f)

    for f in files:
        f1 = file(f, 'r+')
        f2 = file(f + '.mbox', 'w+')

        for line in f1.readlines():
            if line[:4].lower() == 'from':
                if line.find('@') == -1:
                    f2.write(' ' + line)
                else: f2.write(line)
            else: f2.write(line)

        f2.flush()
        f2.close()
        f1.close()
    etime = time.time()
    ttime = etime - btime
    print ttime
    

Squeak in workspace.

"Fix MailBoxes"
|file1 file2 aFileDirectory aFileList line|
Transcript show:
[aFileDirectory := FileDirectory on: 'c:\jimmie\zope\'.
aFileList := aFileDirectory fileNames.
 aFileList do: [:aFile |
  file1 := CrLfFileStream new.
  file1 open: aFileDirectory fullName, '\', aFile forWrite: false.
  file1 := ReadStream on: file1 contentsOfEntireFile.
  file1 reset.
  file2 := StandardFileStream new.
 file2 open: aFileDirectory fullName, '\', aFile, '.txt' forWrite: true.
  line := ''.

  [file1 atEnd] whileFalse:
  [line := file1 nextLine.
  (line asLowercase beginsWith: 'from') ifTrue:
   [((line findString: '@') = 0) ifTrue: [line := ' ', line]].
  file2 nextPutAll: line; cr].

  file2 flush.
  file2 close.]] timeToRun.




More information about the Squeak-dev mailing list