Making Squeak Productive

Benjamin Pollack benjamin.pollack at gmail.com
Sun Apr 10 19:40:39 UTC 2005


On Apr 10, 2005 2:20 PM, Tim Rowledge <tim at sumeru.stanford.edu> wrote:
> Daniel Salama <dsalama at user.net> wrote:
> 
> > Just came across this and thought it was a very nice page for newbies.
> > Wanted to share with you in case others find it useful.
> >
> > http://www.duke.edu/~bmp5/squeak/usable.html
> Interesting and certainly quite useful.
> 
> I'm puzzled by the assertions about the regular expression stuff though. The
> RePlugin is still in the VMMaker world and it uses no platform specific code so
> it certainly ought to work on Macs. It could only be something in the makefile
> (equivalent xCode hoohah) stopping it I think.

I'm the one who wrote that page, so I'm going to both explain why
that's there and take the flak for it. :)

I haven't really updated that site (except the blog) since last
semester; at that time, I was doing a reasonably large project in
Squeak and so could count keeping those pages maintained as part of my
project time. This semester, I'm holding down two jobs plus doing a
full college class load, which has unfortunately meant that that page
is going out-of-date. I'll definitely have time to clean it up the
week of May 8th, but not before then.

RePlugin is not in all the VMs. In particular, it's not in the default
Carbon VMs on John McIntosh's site. Back in August, I think, someone
pointed out that they lacked the regex plugin, and John said he'd add
it in 3.8.0. As of today, what appears to be his most recent VM
(3.8.6b6) still does not have RePlugin. That doesn't really bother me,
since I do batch string processing in shell scripts, but it did mean
that I did not feel I could recommend the Regular Expression Plugin
when a good chunk of my target audience couldn't use it without
considerable effort.

Then in December 2004, there was a discussion in a thread entitled
"Keeping RePlugin around" (spawned off the rant thread) that discussed
whether RePlugin was necessary anymore. You indicated, Tim, that you'd
been planning to discontinue RePlugin. Two other people posted that
they "still" had code using RePlugin, that was the end of the thread,
so the future of the regex plugin seemed dubious when I made the last
update to that page. I probably should have posted to the list and
asked explicitly what the status is, and I apologize for that, but I
didn't pull that totally out of thin air, either. You've obviously
changed your mind, but I didn't know that at the time.

> 
> I don't know enough about the theory of regular expression stuff to do more
> than ask if it is actually the case that PCRE is 'faster, more flexible and
> more powerful'. 

Part of the project I had to do involved taking about 300 MB of
haphazardly organized flat text files and generating a series of SQL
statements to import them into the database. Squeak with Vassili's
regex took over nine hours the only time I tried it (I can't give the
exact time; I started it at 8 AM, went home for the day at at 5 with
it still going, and it finished sometime before the next day). I
rewrote the code to use RePlugin, and found it could execute in about
three hours. Ruby, which I would hardly hold up as an example of a
fast interpreter, could do it in about 80 minutes, at which point the
limiting factor became my hard disk speed, not the CPU. I'd therefore
definitely say that the plugin runs rings around the pure Smalltalk
regex.

As for power, I honestly don't remember what regex functionality the
Smalltalk plugin doesn't support, but I remember that it was enough
that I realized I'd have had to go to RePlugin anyway even if speed
hadn't been an issue. I know it did not have back-referencing or
optional capture groups, and I remember there being some issues with
capture groups in general that may have had to do with them being
always either lazy or greedy. Unfortunately I don't have any of the
Smalltalk-based import code anymore and I haven't used Squeak in
several months, so I'm kind of guessing.

Let me know whether you have any questions/comments about the rest of
the page. I'd be happy to take any suggestions any of you have.

--Benjamin

> Is there some particular disadvantage to the plugin that I
> haven't heard about? It's only ~30kb or so of compiled code so it can't really
> be a space issue of any magnitude. Is the image code particularly large or bad?
> 
> tim
> --
> Tim Rowledge, tim at sumeru.stanford.edu, http://sumeru.stanford.edu/tim
> Useful random insult:- Useful as a football bat.
> 
>



More information about the Squeak-dev mailing list