Documentation, more, more

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Sep 11 00:21:17 UTC 2003


Daniel Vainsencher <danielv at netvision.net.il> wrote:
	Have you looked at 
	http://minnow.cc.gatech.edu/squeak/67? 

I have.  Large chunks of it seem not to be there, like the Stream exercise.

	Yes, I've now done a quick review of [R]. I agree that they have
	a very rich, probably effective approach to documentation.

there are a number of important points.
(0) When you start R, it reminds you about the on-line help:
    f% R

    R : Copyright 2003, The R Development Core Team
    Version 1.7.1  (2003-06-16)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type `license()' or `licence()' for distribution details.

    R is a collaborative project with many contributors.
    Type `contributors()' for more information.

    Type `demo()' for some demos, `help()' for on-line help, or
    `help.start()' for a HTML browser interface to help.
    Type `q()' to quit R.

(1) At interactive level, if you want to be reminded about a function,
    ?cos
    for example will give you the "manual page" for the trig functions.
    Quite a lot of these "manual pages" have examples,
    ?plot
    does, for one.  So you can do
    example(plot)
    and the examples will run.  This is like having example methods in
    a Squeak class, but in practice there are a lot more of them.

(2) You can ask for help about a package:
    help(package=modreg)
    asks about the "modern regression" package.  This is like having
    class comments in a Squeak class, except that *every* R package
    does actually *have* such documentation.  This help is available
    whether or not a package is loaded.

(3) You can ask for help about a function within a package,
    help(ppr, package=modreg)
    asks about projection pursuit regression, for example.
    You can do this even when the package is not loaded.

(4) You can search for a topic , e.g.,
    help.search("ridge regression")
    through all loaded or all available packages, including packages
    not loaded.

(5) Loading and installing a package over the internet is simple,
    as is automatically updating one.  This also brings the documentation
    across; there are never _two_ downloads.

(6) The process of building a package for distribution to others is
    semi-automated, and there is a checking process which amongst other    
    things checks that the documentation is there and the examples run.

The manual pages for functions come to more than 2200 printed pages,
and that's not counting all the packages available from CRAN.  The
Smalltalk/Squeak source management stuff should make it practical to
have this quantity of internal documentation for Squeak.

There's nothing here that Squeak *couldn't* do if there was the
will to do it.

	The newsletter element, for example, is completely missing from
	the Squeak community,

Newsletters are traditional in the statistics community; I used to get
the GLIM one at one time.  The newsletter is good for telling you what's
happening in new releases, but I had used R for about a year before I
bothered to read any issue of the newsletter.
	
The R mailing list is astonishingly like the Squeak mailing list in
some ways.  Both are extremely helpful.  I don't know what the current
status of the Squeak foundation is; there is a comparable R foundation.
R is even an object-oriented language.  (It actually has *two* sets of
object-orientation machinery, "S3 classes" and "S4 classes".)

Perhaps the major difference between R and Squeak is that
R is an attempt to build something that people can routinely and
reliably used to do serious non-programming work, which is reasonably
compatible with a commercial product (S-Plus).  People developing for
R quite often try to ensure that their code works with S-Plus, and
conversely.  This means that the amount of language innovation that
can be done in R is fairly limited.  Some of the strangest, nay,
maddest, features of S are still there in R.  It also means that many
of the people who need information about R would find "you have the
source, read it" incredibly unhelpful, because they are statisticians
or economists or immunologists or whatever, not programmers.

One thing I find with R is that I *can't* do anything nontrivial in R
without the on-line documentation.  Most of the functions I am interested
in calling do fairly "big" things (model fitting, various kinds of plots)
and have lots of options, and return objects with lots of slots, and I
just _can't_ keep that level of detail in my head.



More information about the Squeak-dev mailing list