A WordNet Browser in Squeak (Mach I alpha)
Andrew C. Greenberg
werdna at gate.net
Tue Aug 17 13:14:48 UTC 1999
Attached is a rough, first cut changeset for primitives to read
Princeton's WordNet files, together with higher level classes and an
explorer-style browser for manipulating the same. The present
version works only with Macintosh (and probably Unix) versions of the
Princeton databases. Mach II will handle Wintel as well. To install:
(0) Install Squeak Version 2.5 (or earlier versions with
Bob Arning's hierarchical browser code)
(1) download the Wordset database. A copy can be obtained from:
http://www.cogsci.princeton.edu/~wn/obtain/
(2) drag the folder "Database" into the directory with your
Squeak application
(3) load the changeset. If you have a problem loading WNet initially,
be sure to execute:
WNet release. WNet initialize
before proceeding. To startUp the WordNet browser, execute:
aSearchString exploreInWnet
From inside the browser, you can drill through WordNet pointers,
selecting a list item displays a report of the object in the lower
pane. You can spawn new explorers or generate workspace reports from
a menu item inside the browser. The browser works best in Morphic
(very flaky in MVC, but hey, its just a first cut).
MANY KUDOS to Bob Arning for his outstanding hierarchical browser
framework, which making it possible for me to build in no time and
with some ease a browser when I had ABSOLUTELY NO IDEA what I was
doing! An image of the browser follows:
As noted, the present cut is pretty rough and poorly documented, but
I thought it was more important to get it "out there" than to hang
onto the code at this time. Getting ready for trial in a pending
matter, I probably won't have time to finish that polishing for at
least a month, and I though it might be of interest to some
squeakers. Mach I is somewhat funky and inefficient (I'm going to
build in some database caches for the next version) Very rough
documentation follows:
To get initial handles on objects, you can use the WNet class side message
WNet reportAt: 'dog'
which will generate a workspace reporting on 'dog' for all database
parts of speech (this will be a list of WNWord instances), or you can
query individual parts of speech with
WNet N reportAt: 'dog'
which will generate a workspace reporting on noun senses for 'dog'
(this will be a WNWord). The first sense of the WNWord can be
drilled by executing the following doit in the workspace by the
immediately preceding doIt with:
this reportAt: 1
which will generate a workspace with the WNSense for the first sense
of 'dog' ('this' will be a WNSense). You can drill to get a list of
hypernyms with:
this allHypernyms reportWorkspace
which will generate a workspace with a WNList of Hypernyms. You get
the idea. Sketchy documentation (from the comment for WNet) follows:
WordNet BASICS:
Installation
-------------
Copy the WordNet "Database" folder into the default directory (no
proxies), fileIn the sources, and execute the following doIt:
WNet initialize
If everything is in place, WNet will be ready to go. WNet
automatically reopens the files at startup. If something gets munged
or a new update is installed, execute the following doIts:
WNet close.
WNet initialize
Searching and Drilling:
----------------------------
WNet at: 'dog' "WNList of WNWords for 'dog'"
WNet reportAt: 'dog' "Workspace with list of all
part of speech for 'dog'"
WNet N at: 'dog' "Noun WNWord for 'dog'"
WNet N reportAt: 'dog' "Workspace with list of all
noun senses for 'dog'"
WNet N at: 'dog' senseAt: 1 "Noun Sense Number 1 for 'dog'"
WNet N reportAt: 'dog' senseAt: 1 "Workspace with Noun sense 1 for 'dog'"
Reporting:
------------
anyWNObject reportWorkspace "Open a workspace describing
this object,
in which 'this' is a reference to anyWNObject"
Analyzing:
-------------
senseOrSynset pointers "pointers from sense or synset"
pointer ishypernym "true iff pointer is a
hypernym pointer"
senseOrSynset hypernyms "WNList of immediate
hypernyms of the object"
senseOrSynset allHypernyms "WNList of closure on hypernyms"
senseOrSynset closureOn: aBlock "WNList of closure on
relation defined by
boolean expression aBlock"
e.g.: sense closureOn: [:each | each isHypernym]
N.B.: Closure presently makes no effort to avoid
recursions, so will not work with
all pointer relationships.
senseOrSynset allHypernyms reportWorkspace "Workspace with all
Hypernyms of sense"
"Raw" access to database:
-----------------------------
WNIndexStream is a StandardFileStream that views a WordNet index file
as a stream of WNIndexStreamRecord objects. Likewise with
WNDataStream, but for data files with WNDataStreamRecord objects.
WNIndexStream can be (why?) queried sequentially, using
s positionAtFirstRecord.
[s atEnd] whileFalse: [ . . . s next . . .].
or more usefully, queried by binary searching for a string, using
s positionForWord: aString.
record _ s next
or more directly:
record _ s wordAt: aString
WNDataStream can be queried sequentially as above, but can also be queried:
s position: aSynsetIndex.
record _ s next
WNPOSDictionary provides a higher level access using the WN object
types, reporesenting a part of speech (comprising the index and data
files) and can be queried using
pos wordAt: aString "(or just use at:)"
pos synsetAt: anInteger
pos glossAt: anIndex
WNet provides access to all WNPOSDictionaries for the WordNet
database, and a WNList of query results for all parts of speech can
be obtained by
WNet wordAt: aString
or searches for individual parts of speech can be made
WNet N wordAt: aString
WNet nounAt: aString
Attachment converted: Anon:WordNetDemo.17Aug850am.cs (TEXT/R*ch) (0001238E)
Attachment converted: Anon:test5.gif (GIFf/ogle) (0001238F)
More information about the Squeak-dev
mailing list
|