Scripting languages and IDEs (was: If python goes EToys...)

Thu Aug 24 07:09:00 UTC 2006

Well, I got a lot of flack for my "it's not worth it" response to  
Markus' original post. Very well, I'll make another attempt to  
explain why I think this idea is a lot harder than it sounds. First,  
let me define some terms:

IDE - This is a program that allows one to view and manipulate  
another program in terms of it's semantic elements, such as classes  
and  methods, rather than in terms of the sequence of characters that  
will be fed to a parser. IDEs might happen to display text, but they  
also provide tools like class browsers, refactoring and other  
transformations, auto-completion of identifiers etc, things that  
require a higher level model of the program than text. Examples  
include various Smalltalk implementations, Eclipse, Visual Studio, IDEA.

Scripting language - a programming language and execution model where  
the program is stored as text until it is executed. Immediately prior  
to execution, the runtime environment is created, the program's  
source code is parsed and executed, and then the runtime environment  
is destroyed. This is an important point - the state of the runtime  
environment is not preserved when execution terminates, and one  
invocation of a program cannot influence future invocations.

Now, one might quibble over my definition of "scripting language."  
Fine, I agree that it's not a good general definition of everyday use  
of the term. But it's an important feature of languages like Ruby,  
Python, Perl, Javascript, and PHP and one that makes IDEs for those  
languages particularly hard to write.

Damien Pollet brought up the key issue in designing a Smalltalk-bases  
scripting language - should the syntax be declarative or imperative?  
(Yeah, that again.)

Imperative syntax gives us a lot of flexibility and power in the  
language. A lot of the current fascination with Ruby stems from Java  
programmers discovering what can be done with imperative class  
definitions. The Ruby pickaxe book explains this well:

	In languages such as C++ and Java, class definitions are processed  
at compile time:
	the compiler loads up symbol tables, works out how much storage to  
allocate, constructs
	dispatch tables, and does all those other obscure things we'd rather  
not think too hard
	about. Ruby is different. In Ruby, class and module definitions are  
executable code.

Executable definitions is how metaprogramming is done in scripting  
languages. Ruby on Rails gets a lot of milage out of this,  
essentially by adding class-side methods that can be called from  
within these executable class definitions to generate a lot of boring  
support code. In Java, we can't modify class definitions at runtime,  
and that's why Java folks use so much XML configuration.

Python does this too - http://docs.python.org/ref/class.html. Perl5  
is pretty weird, but Perl6 is slated to handle class definition this  
way as well. Javascript doesn't have class definitions, but we can  
build up pseudoclasses by creating objects and assigning functions to  
their properties.

When writing an executable class definition, we have the full power  
of the language available. You can create methods inside of  
conditionals to tailor the class to it's environment. You can use eval 
() to create methods by manipulating strings. You can send messages  
to other parts of the system. You can do anything.

I'm making a big deal out of this, because I think it's a really,  
really important feature of modern scripting languages.

Declarative syntax, on the other hand, gives us a lot of flexibility  
and power in the tools. Java, C++ and C# have declarative class  
definitions. This means that IDEs can read in the source code, create  
a semantic model of it, manipulate that model in response to user  
commands, and write it back out as source code. The source code has a  
cannonical represenation as text, so the code that's produced is  
similar to the code that was read in, with the textual changes  
proportional to the semantic changes that were made in between.

This is really hard to do with scripting languages, because we can't  
create the semantic units of the program just by parsing the source  
code. You actually have to execute it to fully create the program's  
structure. This is problematic to an IDE for many reasons: the  
program might take a long time to run, it might have undesirable side  
effects (like deleting files), and in the end, there's no way to tell  
whether the program structure we end up with is dependent on the  
input to the program.

Even if we did have a way to glean the program structure from a  
script, there would be no way to write it back out again as source  
code. All of the metaprogramming in the script would be undone,  
partially evaluated, as it were, and we'd be stuck with whatever  
structures were created on that particular invocation of the script.

So, it would appear that we can have either a powerful language, or  
powerful tools, but not both at the same time. And looking around,  
it's notable that there are no good IDEs for scripting languages, but  
none of the languages that have good IDEs lend themselve to  
metaprogramming.

There is, of course, one exception. Smalltalk.

With Smalltalk, we have the best of both worlds. A highly dynamic  
language where metaprogramming is incredibily easy, and at the same  
time, a very powerful IDE. We can do this because we sidestep the  
whole issue of declarative vs. imperative syntax by not having any  
syntax at all.

In Smalltalk, classes and methods are created by executing Smalltalk  
code, just like in scripting languages. That code creates objects  
which reflect the semantic elements of the program, just like in the  
IDEs for compiled languages. One might say that programs in compiled  
languages are primarily state, while programs in scripting languages  
are primarily behavior. Smalltalk programs are object-oriented; they  
have both state and behavior. The secret ingredient that makes this  
work is the image - Smalltalk programs don't have to be represented  
as text.

And that's why a Smalltalk-like scripting language wouldn't be  
worthwhile. It leaves out the very thing that makes Smalltalk work so  
well - the image. It would have to have syntax for creating classes -  
either imperatively or declaratively. We'd end up limiting either the  
language or the tools, or if we tried hard enough, both. There are  
certainly no shortage of languages that have tried to be "Smalltalk,  
but with source code in files."

I'd much rather see a Smalltalk that let me create small, headless  
images, tens or hundreds of kilobytes in size, with just the little  
bits of functionality I need for a particular task. If they had good  
libraries for file I/O, processing text on stdin/stdout and executing  
other commandline programs, they'd fill the "scripting language"  
niche very well. If they could be created and edited by a larger IDE  
image, they'd have the Smalltalk tools advantage as well.

I have high hopes for Spoon in this regard. Between shrinking, remote  
messaging and Flow, it's already got most of the ingredients. It just  
needs to be packaged with a stripped down VM, and integrated into the  
host operating system.

Colin