[PWS] PWS only meant for Swiki?

Lex Spoon lex at cc.gatech.edu
Tue May 4 12:52:56 UTC 1999


Mark Guzdial <guzdial at cc.gatech.edu> wrote:

> >	2. The <> check currently works by doing an initial scan to find
> >all ranges of <> pairs, and then checks at each line end whether the
> >current text position falls within one of those ranges.  If there are 20
> >HTML tags on this page, then this means going through 20 calls to
> >between:and: and 20 block invocations AT EACH LINE END.  A better way is
> >simply to keep a flag which reflects whether the current position is
> >within a <> pair or not; seeing a < turns it on, and seeing a > turns it
> >off; the check at each end of line then becomes extremely cheap.
> 
> Hmm, I just wrote a tiny-and-still-incomplete HTML tag scanner for my class
> as a demonstration
> (http://www.cc.gatech.edu/classes/cs2390_99_spring/slides/parse/outline.html).
> Maybe I can modify that for this purpose. A hand-built scanner will
> probably be faster than a regular expression system.
> 


To help write scanners by hand, there is an indexOfAnyOf: primitive in the standard VM.  This method is just like indexOf: except that you can specify a set of characters to look for instead of just one character.  It is more limitted than scanning regular expressions, but it turns out to be sufficient in most cases.  (most computer languages don't seem to have tokens that are all THAT complicated)   There are a couple of examples of such scanners already in the system: HtmlTokenizer and MailAddressTokenizer.


Lex





More information about the Squeak-dev mailing list