RE: [PWS] PWS only meant for Swiki? - Squeak-dev

4 May 1999

      Mark Guzdial guzdial@cc.gatech.edu wrote:
...
...

The <> check currently works by doing an initial scan to find

all ranges of <> pairs, and then checks at each line end whether the
current text position falls within one of those ranges.  If there are 20
HTML tags on this page, then this means going through 20 calls to
between:and: and 20 block invocations AT EACH LINE END.  A better way is
simply to keep a flag which reflects whether the current position is
within a <> pair or not; seeing a < turns it on, and seeing a > turns it
off; the check at each end of line then becomes extremely cheap.
Hmm, I just wrote a tiny-and-still-incomplete HTML tag scanner for my class
as a demonstration
(http://www.cc.gatech.edu/classes/cs2390_99_spring/slides/parse/outline.html).
Maybe I can modify that for this purpose. A hand-built scanner will
probably be faster than a regular expression system.
To help write scanners by hand, there is an indexOfAnyOf: primitive in the standard VM.  This method is just like indexOf: except that you can specify a set of characters to look for instead of just one character.  It is more limitted than scanning regular expressions, but it turns out to be sufficient in most cases.  (most computer languages don't seem to have tokens that are all THAT complicated)   There are a couple of examples of such scanners already in the system: HtmlTokenizer and MailAddressTokenizer.
Lex