Simple Parser for Natural Language?

Fri Jul 16 17:49:54 UTC 1999

Folks -

Ted Kaehler and I want to write a Squeak program capable of superficially understanding natural language.  (Of course you are all invited to play, too ;-).

By superficial understanding I mean, that it could successfully parse most sentences and could build up a body of valid knowledge structures based on the content.  Of course this does not constitute real understanding, since the relationships may be ambiguous, conflicting or lacking necessary context or metainformation.

However, even at this superficial stage, it could be very useful and probably a lot of fun.  With backpointers to its source material, it could certainly facilitate inquiries about the content.  And with a bit more work, we might actually learn a thing or two about real understanding.

So, here's the question:  Do any of you know of any simple parsers in Smalltalk (or even other languages) that are capable of parsing most english sentences correctly?  Presumably this also requires a lexicon, so it is important that the associated lexicon be in the public domain as well.

Obviously, the next topic of interest is meta-information in the lexicon (like the relationship between infinitessimal, tiny, small, little, average, big, large, enormous, collossal), so if you have any leads onto (again, simple) work along these lines, that is also of interest to us.

The idea is to then point it at a newspaper, the web, or the Squeak archives, and see if we can get it to make any interesting statements, even if they are wrong, and especially if they are funny.  Please don't mock us for simple thoughts about complicated topics.  After all, that's how we got Squeak.

Thanks
	-Dan