Celeste - documentation / tutorial on the message parser code

Lex Spoon lex at cc.gatech.edu
Sat Feb 12 22:28:14 UTC 2005


I contributed a lot of this code, and all I did was read the RFC's, look
at mail code in other programs, and implement what seemed necessary.  It
actually seems very orderly and well commented to me.  Is there anything
specific you are wondering about? 

I don't think that SmaCC would help here, by the way.  I wouldn't have
as much confidence in a SmaCC-based email parser, as I do with the
current hand-coded one.  The rules are things like "if a line starts
with a space, then append it to the previous line".  It seems easier to
write code down like "[line first isSeparator] whileTrue:" than to try
and encypher that into regular expressions and/or BNF rules.

(To contrast, SmaCC is the way to go if either (a) you get to design
your own language, or (b) the language was designed to fit BNF- and
regex-based parser tools.)


Anyway, all this aside, have you considered simply re-using MailMessage
as it is?  Why are you rewriting mature code?  I can see someone not
copying Celeste, because there is a lot of taste involved in how an
email reader is set up.  The parser itself, seems extremely boring.

And everyone else: knock it off, please.  I have now seen a comment
suggesting how it should have been implemented, a comment suggesting
that *any* parser based on SmaCC would be worth replacing the current
one with, and an offer to take a pre-production codebase and replace
this one with it.  Has anyone actually looked at the code or tried to
use it, before you start picking on it?  Believe it or not, some parts
of Squeak have been hammered on quite heavily, and are mature at this
point.  It is great if people want to make it better, but please do not
(as has happened in the past!) yank out mature code and start over from
scratch.


-Lex



====
I represent an Internet mail or news message.

	text - the raw text of my message
	body - the body of my message, as a MIMEDocument
	fields - a dictionary mapping lowercased field names into collections of MIMEHeaderValue's
	parts - if I am a multipart message, then this is a cache of my parts


from: aString 
	"Parse aString to initialize myself."

	| parseStream contentType bodyText contentTransferEncoding |

	tokens _ nil.
	text _ aString withoutTrailingBlanks, String cr.
	parseStream _ ReadStream on: text.
	contentType _ 'text/plain'.
	contentTransferEncoding _ nil.
	fields := Dictionary new.

	"Extract information out of the header fields"
	self fieldsFrom: parseStream do: 
		[:fName :fValue | 
		"NB: fName is all lowercase"

		fName = 'content-type' ifTrue: [contentType _ (fValue copyUpTo: $;) asLowercase].
		fName = 'content-transfer-encoding' ifTrue: [contentTransferEncoding _ fValue asLowercase].

		(fields at: fName ifAbsentPut: [OrderedCollection new: 1])
			add: (MIMEHeaderValue forField: fName fromString: fValue)].

	"Extract the body of the message"
	bodyText _ parseStream upToEnd.
	contentTransferEncoding = 'base64'
		ifTrue: 
			[bodyText _ Base64MimeConverter mimeDecodeToChars: (ReadStream on: bodyText).
			bodyText _ bodyText contents].
	contentTransferEncoding = 'quoted-printable' ifTrue: [bodyText _ QuotedPrintableMimeConverter  mimeDecode: bodyText as: String].
	body _ MIMEDocument contentType: contentType content: bodyText


fieldsFrom: aStream do: aBlock
	"Invoke the given block with each of the header fields from the given stream. The block arguments are the field name and value. The streams position is left right after the empty line separating header and body."

	| savedLine line s |
	savedLine _ MailDB readStringLineFrom: aStream.
	[aStream atEnd] whileFalse: [
		line _ savedLine.
		(line isEmpty) ifTrue: [^self].  "quit when we hit a blank line"
		[savedLine _ MailDB readStringLineFrom: aStream.
		 (savedLine size > 0) and: [savedLine first isSeparator]] whileTrue: [
			"lines starting with white space are continuation lines"
			s _ ReadStream on: savedLine.
			s skipSeparators.
			line _ line, ' ', s upToEnd].
		self reportField: line withBlanksTrimmed to: aBlock].

	"process final header line of a body-less message"
	(savedLine isEmpty) ifFalse: [self reportField: savedLine withBlanksTrimmed to: aBlock].


parseParts
	"private -- parse the parts of the message and store them into a collection"

	| parseStream msgStream messages separator |

	"If this is not multipart, store an empty collection"
	self body isMultipart ifFalse: [parts _ #().  ^self].

	"If we can't find a valid separator, handle it as if the message is not multipart"
	separator := self attachmentSeparator.
	separator ifNil: [Transcript show: 'Ignoring bad attachment separater'; cr. parts _ #(). ^self].

	separator := '--', separator withoutTrailingBlanks.
	parseStream _ ReadStream on: self bodyText.

	msgStream _ LimitingLineStreamWrapper on: parseStream delimiter: separator.
	msgStream limitingBlock: [:aLine |
		aLine withoutTrailingBlanks = separator or:			"Match the separator"
		[aLine withoutTrailingBlanks = (separator, '--')]].	"or the final separator with --"

	"Throw away everything up to and including the first separator"
	msgStream upToEnd.
	msgStream skipThisLine.

	"Extract each of the multi-parts as strings"
	messages _ OrderedCollection new.
	[parseStream atEnd]
		whileFalse: 
			[messages add: msgStream upToEnd.
			msgStream skipThisLine].

	parts _ messages collect: [:e | MailMessage from: e]



More information about the Squeak-dev mailing list