I don't think either the Swiki file format, or the <> check in itself, are inherently slow. It's just that the current implementations that deal with those things are slow:
1. The SwikiPage>>text routine does things similar to upTo: on a FileStream, which results in character-by-character processing. One of these days, someone should fix all that stuff up on FileStream to use buffers, but it's not in place right now. (And who has the time to work on it?). In particular, I think reading in a Squeak string with the doubled ' marks is the biggest hit.
2. The <> check currently works by doing an initial scan to find all ranges of <> pairs, and then checks at each line end whether the current text position falls within one of those ranges. If there are 20 HTML tags on this page, then this means going through 20 calls to between:and: and 20 block invocations AT EACH LINE END. A better way is simply to keep a flag which reflects whether the current position is within a <> pair or not; seeing a < turns it on, and seeing a > turns it off; the check at each end of line then becomes extremely cheap.
Lex
Bijan Parsia bparsia@email.unc.edu wrote:
On Mon, 3 May 1999, Lex Spoon wrote:
I am curious how, in practice, a Swiki performs. In particular, how practical are these things for communities of several hundred people?
The current Swikis haven't been optimized much at all, but as Mark Guzdial posted earlier, they are handling his ~100 student sophomore class without trouble.
I've used swiki's for 25-40 person classes for, gosh, over a year. It does ok. I have my own swikifying routine which avoids some of the speed hit Lex pointed out.
I can quantify this a little more. We did some quick tests on a Mac (Bolot, Je77, do you know what kind of Mac it is?), and found two things that are bottlenecks right now:
- Reading a Swiki's source text, ie SwikiPage>>text, takes over
100 milliseconds by itself. This could obviously be improved: contentsOfEntireFile on the samy file took like 15-20 milliseconds.
I'm going to change that tomorrow! Whoa. This must absolutely *kill* on searchs. And it certainly explains the interminable restore times.
- "swikifying" a page is currently dominated by the check for
whether a particular line break is in between < and >. This check is needed, but it could be made more efficient. With the check, the time taken is about 200 milliseconds. Without the check, the time is down in the 10-20 millisecond range, if I remember right. It was definately a LOT less.
Wow. I didn't realize the check was *that* painful. And here I thought it was my clever parser that made mine so much quicker. <wince/> Good to know though.
However, there seems to be other places where Swiki's can be a tad problematic: If someone else is accessing a page and PWS is processing (like doing a search) I get a "server unavailable". This is with Squeak 2.1. If this has been fixed (via threading?) in later versions I'd be happy except for the weird slowdowns I experience :(
One thing I think would nice is if the text and the "metadata" of a Swikipage were stored in separate files. Right now, you only save things like name, edit time, etc. when you edit the page. Thus, if your server crashes you lose all metadata since the last edit or image save.
On the other hand, I'm eager to see some sort of PWS connection to MinneStore. Now *that* will be interesting.
Cheers, Bijan
I don't think either the Swiki file format, or the <> check in itself, are inherently slow. It's just that the current implementations that deal with those things are slow:
- The SwikiPage>>text routine does things similar to upTo: on a
FileStream, which results in character-by-character processing. One of these days, someone should fix all that stuff up on FileStream to use buffers, but it's not in place right now. (And who has the time to work on it?). In particular, I think reading in a Squeak string with the doubled ' marks is the biggest hit.
I think that Bolot's new PWS implementation is all stream-based rather than passing strings around -- both for speed and for memory hits. (Try to serve a 25M QuickTime Star Wars Trailer in the current PWS :-) This was one of the tests that Bolot did of the current system.
- The <> check currently works by doing an initial scan to find
all ranges of <> pairs, and then checks at each line end whether the current text position falls within one of those ranges. If there are 20 HTML tags on this page, then this means going through 20 calls to between:and: and 20 block invocations AT EACH LINE END. A better way is simply to keep a flag which reflects whether the current position is within a <> pair or not; seeing a < turns it on, and seeing a > turns it off; the check at each end of line then becomes extremely cheap.
Hmm, I just wrote a tiny-and-still-incomplete HTML tag scanner for my class as a demonstration (http://www.cc.gatech.edu/classes/cs2390_99_spring/slides/parse/outline.html). Maybe I can modify that for this purpose. A hand-built scanner will probably be faster than a regular expression system.
Mark
-------------------------- Mark Guzdial : Georgia Tech : College of Computing : Atlanta, GA 30332-0280 (404) 894-5618 : Fax (404) 894-0673 : guzdial@cc.gatech.edu http://www.cc.gatech.edu/gvu/people/Faculty/Mark.Guzdial.html
On Tue, 4 May 1999, Mark Guzdial wrote:
[snip]
I think that Bolot's new PWS implementation is all stream-based rather than passing strings around -- both for speed and for memory hits. (Try to serve a 25M QuickTime Star Wars Trailer in the current PWS :-) This was one of the tests that Bolot did of the current system.
This won't help with reading from disk, will it? Just curious. Just a reminder (provoked by a recent discussion on comp.lang.smalltalk), that big savings occur if you presize the stream.
[snip]
Hmm, I just wrote a tiny-and-still-incomplete HTML tag scanner for my class as a demonstration (http://www.cc.gatech.edu/classes/cs2390_99_spring/slides/parse/outline.html). Maybe I can modify that for this purpose. A hand-built scanner will probably be faster than a regular expression system.
Ah, this reminds me of a something I was playing with a ways back. I wanted my SwikiParser to produce (as it's parse tree) Scamper HTML objects. Then have that write out to html or just display (if this was in squeak). So SwikiParser's would be subclasses of HtmlParser. One advantage is that you could easily format your HTML nicely, and, indeed, enforce some standards (e.g., close tags).
Indeed, I'd like to have the templates as a HTMLish tree. It would make them much easier to manipulate programmatically. (I suspect that debugging them would be easier too ;))
Alas, I think it's going to take some serious revamping of the Html classes.
Still, something to think about...
Cheers, Bijan.
squeak-dev@lists.squeakfoundation.org