A WordNet Browser in Squeak (Mach I alpha)
Andrew C. Greenberg
werdna at gate.net
Tue Aug 17 13:14:48 UTC 1999
Attached is a rough, first cut changeset for primitives to read
Princeton's WordNet files, together with higher level classes and an
explorer-style browser for manipulating the same. The present
version works only with Macintosh (and probably Unix) versions of the
Princeton databases. Mach II will handle Wintel as well. To install:
(0) Install Squeak Version 2.5 (or earlier versions with
Bob Arning's hierarchical browser code)
(1) download the Wordset database. A copy can be obtained from:
(2) drag the folder "Database" into the directory with your
(3) load the changeset. If you have a problem loading WNet initially,
be sure to execute:
WNet release. WNet initialize
before proceeding. To startUp the WordNet browser, execute:
From inside the browser, you can drill through WordNet pointers,
selecting a list item displays a report of the object in the lower
pane. You can spawn new explorers or generate workspace reports from
a menu item inside the browser. The browser works best in Morphic
(very flaky in MVC, but hey, its just a first cut).
MANY KUDOS to Bob Arning for his outstanding hierarchical browser
framework, which making it possible for me to build in no time and
with some ease a browser when I had ABSOLUTELY NO IDEA what I was
doing! An image of the browser follows:
As noted, the present cut is pretty rough and poorly documented, but
I thought it was more important to get it "out there" than to hang
onto the code at this time. Getting ready for trial in a pending
matter, I probably won't have time to finish that polishing for at
least a month, and I though it might be of interest to some
squeakers. Mach I is somewhat funky and inefficient (I'm going to
build in some database caches for the next version) Very rough
To get initial handles on objects, you can use the WNet class side message
WNet reportAt: 'dog'
which will generate a workspace reporting on 'dog' for all database
parts of speech (this will be a list of WNWord instances), or you can
query individual parts of speech with
WNet N reportAt: 'dog'
which will generate a workspace reporting on noun senses for 'dog'
(this will be a WNWord). The first sense of the WNWord can be
drilled by executing the following doit in the workspace by the
immediately preceding doIt with:
this reportAt: 1
which will generate a workspace with the WNSense for the first sense
of 'dog' ('this' will be a WNSense). You can drill to get a list of
this allHypernyms reportWorkspace
which will generate a workspace with a WNList of Hypernyms. You get
the idea. Sketchy documentation (from the comment for WNet) follows:
Copy the WordNet "Database" folder into the default directory (no
proxies), fileIn the sources, and execute the following doIt:
If everything is in place, WNet will be ready to go. WNet
automatically reopens the files at startup. If something gets munged
or a new update is installed, execute the following doIts:
Searching and Drilling:
WNet at: 'dog' "WNList of WNWords for 'dog'"
WNet reportAt: 'dog' "Workspace with list of all
part of speech for 'dog'"
WNet N at: 'dog' "Noun WNWord for 'dog'"
WNet N reportAt: 'dog' "Workspace with list of all
noun senses for 'dog'"
WNet N at: 'dog' senseAt: 1 "Noun Sense Number 1 for 'dog'"
WNet N reportAt: 'dog' senseAt: 1 "Workspace with Noun sense 1 for 'dog'"
anyWNObject reportWorkspace "Open a workspace describing
in which 'this' is a reference to anyWNObject"
senseOrSynset pointers "pointers from sense or synset"
pointer ishypernym "true iff pointer is a
senseOrSynset hypernyms "WNList of immediate
hypernyms of the object"
senseOrSynset allHypernyms "WNList of closure on hypernyms"
senseOrSynset closureOn: aBlock "WNList of closure on
relation defined by
boolean expression aBlock"
e.g.: sense closureOn: [:each | each isHypernym]
N.B.: Closure presently makes no effort to avoid
recursions, so will not work with
all pointer relationships.
senseOrSynset allHypernyms reportWorkspace "Workspace with all
Hypernyms of sense"
"Raw" access to database:
WNIndexStream is a StandardFileStream that views a WordNet index file
as a stream of WNIndexStreamRecord objects. Likewise with
WNDataStream, but for data files with WNDataStreamRecord objects.
WNIndexStream can be (why?) queried sequentially, using
[s atEnd] whileFalse: [ . . . s next . . .].
or more usefully, queried by binary searching for a string, using
s positionForWord: aString.
record _ s next
or more directly:
record _ s wordAt: aString
WNDataStream can be queried sequentially as above, but can also be queried:
s position: aSynsetIndex.
record _ s next
WNPOSDictionary provides a higher level access using the WN object
types, reporesenting a part of speech (comprising the index and data
files) and can be queried using
pos wordAt: aString "(or just use at:)"
pos synsetAt: anInteger
pos glossAt: anIndex
WNet provides access to all WNPOSDictionaries for the WordNet
database, and a WNList of query results for all parts of speech can
be obtained by
WNet wordAt: aString
or searches for individual parts of speech can be made
WNet N wordAt: aString
WNet nounAt: aString
Attachment converted: Anon:WordNetDemo.17Aug850am.cs (TEXT/R*ch) (0001238E)
Attachment converted: Anon:test5.gif (GIFf/ogle) (0001238F)
More information about the Squeak-dev