[ANN] Chuck type inferencer

Lex Spoon lex at cc.gatech.edu
Wed May 26 17:55:58 UTC 2004

The Chuck type inferencer is now available on SqueakMap.  This is a tool
I've been working on for a few years that can answer questions about the
types of variables and expressions by searching through your program
code.  For example, you can ask for the type of a variable x, and it
will go find every expression that writes into x (e.g., x := y), and
find the types of the right-hand-sides, and so on recursively.  It's
similar to what people do by hand anyway to find types, but Chuck
automates it.

Additionally, this kind of tool should be useful as a building block for
other tools.  First, Chuck should be a useful as part of the backend for
for programming tools such as compilers and dead code removers.  
Second, the "AutoChuck" part of it maintains a handy model of all the
system's source code along with various lookup tables like "who writes
into this variable"; this part alone should be useful to people who want
to do any sort of investigations about the source code of the system.

If you want to play with the inferencer, you need to grab Squeak 3.7 (I
am at update 5707), install the Refactoring Browser, and then install
the Chuck package from SqueakMap.  From there, you can run and read some
test cases.  For serious day to day use, you then need to turn on
AutoChuck, which is a preference in the "browsing" area of the
preferences browser.  This will spend 10-60 minutes filling up your
image with 75 megabytes or so of lookup tables; after that, it should be
invisible to your normal activities, and the refactoring browser will
have two new menu items available:

	1. "infer type", in code panes.  Select any expression and you can run
this, and it will dig up a type for you.
	2. "specific senders of", in message-list panes.  It acts like
senders-of but uses type inference on the message-senders in order to
reject some of the message senders as impossible.

More serious users may want to tweak the "preferences" class methods of
ChuckInferencer; you can tweak how many goals the inferencer will use
and how much wall-clock time it will spend before bailing out.  (I
intend to extend the ChuckConfigurator dialog to let users set these
things in a GUI, but that hasn't happened yet.)

The project home page is:

This page includes a good description of the algorithm, although more
description -- about 100-150 pages more -- is on the way by next May.

A quick word on what this type inferencer is giving you.  The types it
infers are guaranteed bounds on what the variables or expressions will
evaluate to at runtime.  They do not represent what the programmer
intended to be possible for the variable, and they are not necessarily a
good summary of what the variable is.

An example should help.  If you infer a type of PlayingCardDeck's
stackingPolicy variable, Chuck will tell you that it will hold one of
these things:  nil, #stagger, #straight, #altStraight, or #single.  The
comment of the variable tells a slightly different story!  First, the
comment gives #none as an option while the type inference does not; the
reason is that nothing in the image uses #none, even though the code can
handle it.  Second, the comment does *not* mention #stagger as an
option.  #stagger does indeed get assigned to the variable, so the code
and the comment are out of sync here.  The first item is more
interesting I think: for this tool to be effective, you want to have
uses of the code loaded in addition to the code you are studying.

Finally, some people may be curious about the general approach of this
thing.  There is some documentation on the web site, and some more
serious documentation being developed, but let me give a quick summary
because Chuck is a little unusual.  Essentially, Chuck posts goals to
itself and then tries to solve them.  In the course of solving a goal,
it is allowed to post more goals.  Each goal is initially given a very
optimistic tentative solution (e.g., this variable is never assiged any
value), and then the algorithm repeatedly updates the tentative
solutions, making them less precise, until it has found an entire set of
solutions that are consistent with each other.    At that point the
tentative solutions can be treated as full solutions.

This is different from other algorithms I know of, which do one of two
things: they simulate the program in the abstract, using types instead
of values, or, they generate a huge pile of constraints ("type of x is
greater than type of y"...), and solve them all.  What Chuck's approach
allows it to do, however, is to *prune* its subproblems.  That is, Chuck
can at any time answer a goal with a trivial but poor answer; the
subgoals of such a goal then become unnecessary and can be abandoned. 
If Chuck is trying to find the type of "node isNil", and it is unable to
infer the type of "node", it has the option of substituting "Anything"
for the type of node and then proceding onwards (in this case, still
finding the optimal solution of {True False}).  In short, Chuck's
architecture lets its performance degrade gracefully as programs get
large.  The existing algorithms seem to get into trouble with around
30-50 kloc programs, whereas Squeak has about 300 klocs.

All right, have at it.  Comments welcome!

More information about the Squeak-dev mailing list