Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
Thank you!
-- Yoshiki
On 28 December 2010 11:19, Yoshiki Ohshima yoshiki@vpri.org wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
Been there did that :) Implementing ST compiler in C is not very hard, since syntax is extremely simple (comparing to C). But the thing is that now you need to keep image-side compiler and C compiler in sync. And very soon you will figure out, that while its extermely easy to extend ST compiler in ST, its very hard to do that in C (see things like Helvetia by Lukas Renggli etc).
What i like about image-side compiler that i can use the metaprogramming and late-binding capabilities in language for compiling the source. While C compiler having no such flexibility, and good to have when you plan to freeze things for ages.
That's why i prefer to implement bootstrap in smalltalk. Because it really doesn't matters , in which language you implementing a bootstrap..
Sorry, if my comment was discouraging. But i had to say it :)
Thank you!
-- Yoshiki
Hello,
On Tue, Dec 28, 2010 at 1:32 PM, Igor Stasenko siguctua@gmail.com wrote:
What i like about image-side compiler that i can use the metaprogramming and late-binding capabilities in language for compiling the source. While C compiler having no such flexibility, and good to have when you plan to freeze things for ages.
Thinking, that nobody here will argue about image-side compiler and it's capabilities.. etc.
But, Yoshiki is talking about, that you could "generate a growable *image file from all text files*, or make deep changes to the system without shooting yourself". And this is really awesome and so long waited step!
Best regards, Nikolay
On Tue, Dec 28, 2010 at 7:23 AM, Nikolay Suslov nsuslovi@gmail.com wrote:
Thinking, that nobody here will argue about image-side compiler and it's capabilities.. etc.
But, Yoshiki is talking about, that you could "generate a growable image file from all text files, or make deep changes to the system without shooting yourself". And this is really awesome and so long waited step!
Well, you do need a C compiler, presumably in the form of an executable binary. And if you're going to grant special dispensation to an "external tool" implemented in C, why not to an external tool implemented in Smalltalk?
That said, bravo, Yoshiki! This is a great project.
Colin
On Tue, Dec 28, 2010 at 3:35 PM, Colin Putney colin@wiresong.com wrote:
On Tue, Dec 28, 2010 at 7:23 AM, Nikolay Suslov nsuslovi@gmail.com wrote:
Thinking, that nobody here will argue about image-side compiler and it's capabilities.. etc.
But, Yoshiki is talking about, that you could "generate a growable image file from all text files, or make deep changes to the system without shooting
yourself".
And this is really awesome and so long waited step!
Well, you do need a C compiler, presumably in the form of an executable binary.
Yes, of course, and C compiler is available anywhere!.. then for bootstrapping the image you will just need sources in a text form..
And if you're going to grant special dispensation to an "external tool" implemented in C, why not to an external tool implemented in Smalltalk?
sure, may be eventually this "external tool" could be bootstrapped by Ian's COLA... also just from "all text files", needing in a C compiler on the first stages only.
Best regards, Nikolay
That said, bravo, Yoshiki! This is a great project.
Colin
On 28 December 2010 14:47, Nikolay Suslov nsuslovi@gmail.com wrote:
On Tue, Dec 28, 2010 at 3:35 PM, Colin Putney colin@wiresong.com wrote:
On Tue, Dec 28, 2010 at 7:23 AM, Nikolay Suslov nsuslovi@gmail.com wrote:
Thinking, that nobody here will argue about image-side compiler and it's capabilities.. etc.
But, Yoshiki is talking about, that you could "generate a growable image file from all text files, or make deep changes to the system without shooting yourself". And this is really awesome and so long waited step!
Well, you do need a C compiler, presumably in the form of an executable binary.
Yes, of course, and C compiler is available anywhere!.. then for bootstrapping the image you will just need sources in a text form..
machine code available everywhere, C are not :)
And if you're going to grant special dispensation to an "external tool" implemented in C, why not to an external tool implemented in Smalltalk?
sure, may be eventually this "external tool" could be bootstrapped by Ian's COLA... also just from "all text files", needing in a C compiler on the first stages only.
can anyone tell me, when last time he had to deal with hardware, which having no preinstalled operating system/BIOS up and running, or there only C compiler and no any other languages which can run on it?
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
Hi Igor,
On 28 December 2010 15:19, Igor Stasenko siguctua@gmail.com wrote:
can anyone tell me, when last time he had to deal with hardware, which having no preinstalled operating system/BIOS up and running, or there only C compiler and no any other languages which can run on it?
we had that with NXTalk ... but we admittedly cheated a bit by using an available open-source minimal OS whose interface our C code addressed.
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
:-)
On the other hand, stacking ever more abstractions on top of each other eventually costs an amount of performance that will be noticed.
Best,
Michael
On 28 December 2010 15:54, Michael Haupt mhaupt@gmail.com wrote:
Hi Igor,
On 28 December 2010 15:19, Igor Stasenko siguctua@gmail.com wrote:
can anyone tell me, when last time he had to deal with hardware, which having no preinstalled operating system/BIOS up and running, or there only C compiler and no any other languages which can run on it?
we had that with NXTalk ... but we admittedly cheated a bit by using an available open-source minimal OS whose interface our C code addressed.
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
:-)
On the other hand, stacking ever more abstractions on top of each other eventually costs an amount of performance that will be noticed.
The key words here is 'on top of' which means an evolutionary approach, not revolutionary. Because for revolutionary things, you would use 'instead of' wording. Every time you building something on top of C, you inheriting its good and bad traits, because you can't escape the semantic model, imposed by C language and its compiler(s).
Best,
Michael
Hi Igor,
On 28 December 2010 16:13, Igor Stasenko siguctua@gmail.com wrote:
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
... On the other hand, stacking ever more abstractions on top of each other eventually costs an amount of performance that will be noticed.
The key words here is 'on top of' which means an evolutionary approach, not revolutionary. Because for revolutionary things, you would use 'instead of' wording.
good distinction, I see.
Every time you building something on top of C, you inheriting its good and bad traits, because you can't escape the semantic model, imposed by C language and its compiler(s).
I've found C to be rather malleable; a possible way of providing one least complex abstraction over raw assembly. Not comfy, but malleable.
However, ultimately, you will have to talk to the metal. I completely agree that it is much nicer to talk to the metal from a language standing on a higher ground. The required compiler should not waste resources (of whatever kind). Building such a compiler surely is daunting. Perhaps that is one reason why the "C layer" is still popular as a target.
Or is it the metal that is shaped in the wrong way? There used to be Lisp machines ...
Best,
Michael
On 28 December 2010 21:16, Michael Haupt mhaupt@gmail.com wrote:
Hi Igor,
On 28 December 2010 16:13, Igor Stasenko siguctua@gmail.com wrote:
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
... On the other hand, stacking ever more abstractions on top of each other eventually costs an amount of performance that will be noticed.
The key words here is 'on top of' which means an evolutionary approach, not revolutionary. Because for revolutionary things, you would use 'instead of' wording.
good distinction, I see.
Every time you building something on top of C, you inheriting its good and bad traits, because you can't escape the semantic model, imposed by C language and its compiler(s).
I've found C to be rather malleable; a possible way of providing one least complex abstraction over raw assembly. Not comfy, but malleable.
you probably will be surprised , but i find raw assembly are much malleable than C. Simply because it having much less constraints - only those, which in hardware.
However, ultimately, you will have to talk to the metal. I completely agree that it is much nicer to talk to the metal from a language standing on a higher ground. The required compiler should not waste resources (of whatever kind). Building such a compiler surely is daunting. Perhaps that is one reason why the "C layer" is still popular as a target.
Perhaps. Still investments to talking directly with metal are well rewarded.. like ~3x speedup in Cog.
Or is it the metal that is shaped in the wrong way? There used to be Lisp machines ...
That's a good question. But we have what we have. :)
Best,
Michael
Hi Igor,
On 29 December 2010 00:11, Igor Stasenko siguctua@gmail.com wrote:
Every time you building something on top of C, you inheriting its good and bad traits, because you can't escape the semantic model, imposed by C language and its compiler(s).
I've found C to be rather malleable; a possible way of providing one least complex abstraction over raw assembly. Not comfy, but malleable.
you probably will be surprised , but i find raw assembly are much malleable than C. Simply because it having much less constraints - only those, which in hardware.
that's not surprising. I was responding to your remark that C was somehow confining people. Obviously, raw assembly is even more malleable, but hey, it's also even less comfy. :-)
Perhaps. Still investments to talking directly with metal are well rewarded.. like ~3x speedup in Cog.
No one would seriously doubt that.
Or is it the metal that is shaped in the wrong way? There used to be Lisp machines ...
That's a good question. But we have what we have. :)
And that is not a revolutionary statement. ;-)
Best,
Michael
Igor,
On Tue, Dec 28, 2010 at 5:19 PM, Igor Stasenko siguctua@gmail.com wrote:
machine code available everywhere, C are not :)
I mean, C as the one of the common languages that provide the constructs that map very efficiently to machine instructions (machine code). And the majority of OS's have a C compiler by default, and this guarantied that users needn't download any third party binary software, for compiling.
And C is needed only for bootstrapping! Ok? So, use for that any other language, if you want.. And revolution will happens just after bootstrapping, don't miss it... going into "infinite loop", while beating with C language :)
Regards, Nikolay
Hi,
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
Can be more of one reason, e.g. ... because people are still thinking in bootstraping... a new genesis... something irrelevant in terms of open systems (ambience), like Smalltalk. Under the point of view of systems where semantics can change, the genesis is only anegdotic. It is more important the sustainability of the system itself (the persistence of the self... when contents change through time), the survival in case of accidents (after sensing damage), and the preservation of identity (been known as the SAME systems after changes). Many syntax can coexists, and it is related with diversity of expression; not with ideals (a mother language/syntax to make the first spell).
The change of semantics though actions "outside" the system itself make us continue writing programs. oops! the Program class is still missing! :-P IMHO, changes in core semantics and the effects of doing the changes in the system should be promoted, because are important evidence to recognize smalltalk as an open system and not as another OO language (the contrary has happened during last years, insisting in the importance of the "code").
Yes, of course, and C compiler is available anywhere!.. then for bootstrapping the image you will just need sources in a text form..
Image as the snapshot of a system, contains representation of objects.. in text or binary; it is only what has been stored from a system in the past. It is as important as a snapshot of a woman. I prefer to invest my time gaining experience with the living woman. People that write good poems (code) are forced to write another, tomorrow.
Ah!.... Another reason for "computer revolution didn't happened yet", can be because people still think in terms of computation, as systems are still made to compute something.
cheers, Ale.
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Tuesday, December 28, 2010 11:19 AM Subject: Re: [squeak-dev] A Bootstrap Compiler
On 28 December 2010 14:47, Nikolay Suslov nsuslovi@gmail.com wrote:
On Tue, Dec 28, 2010 at 3:35 PM, Colin Putney colin@wiresong.com wrote:
On Tue, Dec 28, 2010 at 7:23 AM, Nikolay Suslov nsuslovi@gmail.com wrote:
Thinking, that nobody here will argue about image-side compiler and it's capabilities.. etc.
But, Yoshiki is talking about, that you could "generate a growable image file from all text files, or make deep changes to the system without shooting yourself". And this is really awesome and so long waited step!
Well, you do need a C compiler, presumably in the form of an executable binary.
Yes, of course, and C compiler is available anywhere!.. then for bootstrapping the image you will just need sources in a text form..
machine code available everywhere, C are not :)
And if you're going to grant special dispensation to an "external tool" implemented in C, why not to an external tool implemented in Smalltalk?
sure, may be eventually this "external tool" could be bootstrapped by Ian's COLA... also just from "all text files", needing in a C compiler on the first stages only.
can anyone tell me, when last time he had to deal with hardware, which having no preinstalled operating system/BIOS up and running, or there only C compiler and no any other languages which can run on it?
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
On 28 December 2010 16:58, Alejandro F. Reimondo aleReimondo@smalltalking.net wrote:
Hi,
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
Can be more of one reason, e.g. ... because people are still thinking in bootstraping... a new genesis... something irrelevant in terms of open systems (ambience), like Smalltalk. Under the point of view of systems where semantics can change, the genesis is only anegdotic. It is more important the sustainability of the system itself (the persistence of the self... when contents change through time), the survival in case of accidents (after sensing damage), and the preservation of identity (been known as the SAME systems after changes). Many syntax can coexists, and it is related with diversity of expression; not with ideals (a mother language/syntax to make the first spell).
The change of semantics though actions "outside" the system itself make us continue writing programs. oops! the Program class is still missing! :-P IMHO, changes in core semantics and the effects of doing the changes in the system should be promoted, because are important evidence to recognize smalltalk as an open system and not as another OO language (the contrary has happened during last years, insisting in the importance of the "code").
Yes, that is a long awaiting promotion , which is not happen yet :)
Yes, of course, and C compiler is available anywhere!.. then for bootstrapping the image you will just need sources in a text form..
Image as the snapshot of a system, contains representation of objects.. in text or binary; it is only what has been stored from a system in the past. It is as important as a snapshot of a woman. I prefer to invest my time gaining experience with the living woman. People that write good poems (code) are forced to write another, tomorrow.
I think that bootstrapping having a value, as being able to (re)produce system completely from human readable text (i.e. source code).
But now, after your post i think that problem is deeper, and a real reason why we pursuing such idea is that we don't trust computer(s) to reason about what is system is and what is not. It is we, who giving the names to classes, it is we, who defining what is Kernel, and what is optional etc.
From machine's perspective names mean nothing. But for us, humans - it
is a bridge between our way of thinking and computer model(s). Another aspect of it, that probably we want to have a control about every aspect in the system, since we have all the source code, which gives us an impression, that once bootstrapped, everything in it will run as we predicted and expected. Which of course not true, giving the failure of C++ or Java (or any other compile-all-from-sources) based environments to tame complexity.
Ah!.... Another reason for "computer revolution didn't happened yet", can be because people still think in terms of computation, as systems are still made to compute something.
Thanks for good post, Alejandro. It made me look at things at different angle. Which means i learned something new (and i hope others too).
P.S. i really think that yes, we need to have bootstrapping model(s), but of course, not in C.
At Tue, 28 Dec 2010 12:58:22 -0300, Alejandro F. Reimondo wrote:
Hi,
P.S. i think i know the answer to question why "computer revolution didn't happened yet", because every time people inventing something new, they implementing it in C.
Can be more of one reason, e.g. ... because people are still thinking in bootstraping... a new genesis... something irrelevant in terms of open systems (ambience), like Smalltalk. Under the point of view of systems where semantics can change, the genesis is only anegdotic. It is more important the sustainability of the system itself (the persistence of the self... when contents change through time), the survival in case of accidents (after sensing damage), and the preservation of identity (been known as the SAME systems after changes). Many syntax can coexists, and it is related with diversity of expression; not with ideals (a mother language/syntax to make the first spell).
The change of semantics though actions "outside" the system itself make us continue writing programs. oops! the Program class is still missing! :-P IMHO, changes in core semantics and the effects of doing the changes in the system should be promoted, because are important evidence to recognize smalltalk as an open system and not as another OO language (the contrary has happened during last years, insisting in the importance of the "code").
Yes, of course, and C compiler is available anywhere!.. then for bootstrapping the image you will just need sources in a text form..
Image as the snapshot of a system, contains representation of objects.. in text or binary; it is only what has been stored from a system in the past. It is as important as a snapshot of a woman. I prefer to invest my time gaining experience with the living woman. People that write good poems (code) are forced to write another, tomorrow.
Ah!.... Another reason for "computer revolution didn't happened yet", can be because people still think in terms of computation, as systems are still made to compute something.
Good writing. I am (also) more concerned with finding a good meta-language. C is convenience with its ubiquity and having a parser generator so it is better than assembly.
"The new system" is along the line of "NotSqueak" in http://www.vpri.org/pdf/tr2010004_steps10.pdf, which will be quite different and will have its own evolution path. But the malleability is drawn upon from Smalltalk and will be more dynamic in its nature.
-- Yoshiki
On 28 December 2010 13:23, Nikolay Suslov nsuslovi@gmail.com wrote:
Hello,
On Tue, Dec 28, 2010 at 1:32 PM, Igor Stasenko siguctua@gmail.com wrote:
What i like about image-side compiler that i can use the metaprogramming and late-binding capabilities in language for compiling the source. While C compiler having no such flexibility, and good to have when you plan to freeze things for ages.
Thinking, that nobody here will argue about image-side compiler and it's capabilities.. etc.
But, Yoshiki is talking about, that you could "generate a growable image file from all text files, or make deep changes to the system without shooting yourself".
But nobody forcing you to hack existing obejcts of environment you are running in. How many people you seen who wanted to hack C compiler for bootstrapping own C application?
This means that you can implement things to build a separate object memory graph, which can be held in byte array, and then simply save it to file.
And this is really awesome and so long waited step!
Best regards, Nikolay
At Tue, 28 Dec 2010 11:32:52 +0100, Igor Stasenko wrote:
On 28 December 2010 11:19, Yoshiki Ohshima yoshiki@vpri.org wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
Been there did that :)
Ah, now I remember you mentioned it before.
Implementing ST compiler in C is not very hard, since syntax is extremely simple (comparing to C). But the thing is that now you need to keep image-side compiler and C compiler in sync. And very soon you will figure out, that while its extermely easy to extend ST compiler in ST, its very hard to do that in C (see things like Helvetia by Lukas Renggli etc).
Well, my bootstrap compiler and image compiler are both in variants of PEG parser generators. And, presumably both should be the same source but only use different backend, as Nikolay suggested. So, C is just a convenience target I could use, but not really essential.
What i like about image-side compiler that i can use the metaprogramming and late-binding capabilities in language for compiling the source. While C compiler having no such flexibility, and good to have when you plan to freeze things for ages.
That's why i prefer to implement bootstrap in smalltalk. Because it really doesn't matters , in which language you implementing a bootstrap..
Sorry, if my comment was discouraging. But i had to say it :)
Not at all. I should also stress that our main project is make a new system, and this is just a side project for my learning cuve to understand things in terms of Squeak. I probably am not going to so much time on this variant, but would like to do similar thing for a new system...
Thanks!
-- Yoshiki
Sorry, if my comment was discouraging. But i had to say it :)
Not at all. I should also stress that our main project is make a new system, and this is just a side project for my learning cuve to understand things in terms of Squeak. I probably am not going to so much time on this variant, but would like to do similar thing for a new system...
What is a "new" system ? What change make a system, another system?
If the system is a closed system, any change (a bit) make it another system, without compromise with prior system (it is another set of rules). In a system that can change (like smalltalk) identity of the system is preserved; no matter where the change is made. What make it the same system is the preservation of identity.
The change that make a system another system, is the (intentional) change of it's name.
happy "new" year! Ale.
----- Original Message ----- From: "Yoshiki Ohshima" yoshiki@vpri.org To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Tuesday, December 28, 2010 1:48 PM Subject: Re: [squeak-dev] A Bootstrap Compiler
At Tue, 28 Dec 2010 11:32:52 +0100, Igor Stasenko wrote:
On 28 December 2010 11:19, Yoshiki Ohshima yoshiki@vpri.org wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
Been there did that :)
Ah, now I remember you mentioned it before.
Implementing ST compiler in C is not very hard, since syntax is extremely simple (comparing to C). But the thing is that now you need to keep image-side compiler and C compiler in sync. And very soon you will figure out, that while its extermely easy to extend ST compiler in ST, its very hard to do that in C (see things like Helvetia by Lukas Renggli etc).
Well, my bootstrap compiler and image compiler are both in variants of PEG parser generators. And, presumably both should be the same source but only use different backend, as Nikolay suggested. So, C is just a convenience target I could use, but not really essential.
What i like about image-side compiler that i can use the metaprogramming and late-binding capabilities in language for compiling the source. While C compiler having no such flexibility, and good to have when you plan to freeze things for ages.
That's why i prefer to implement bootstrap in smalltalk. Because it really doesn't matters , in which language you implementing a bootstrap..
Sorry, if my comment was discouraging. But i had to say it :)
Not at all. I should also stress that our main project is make a new system, and this is just a side project for my learning cuve to understand things in terms of Squeak. I probably am not going to so much time on this variant, but would like to do similar thing for a new system...
Thanks!
-- Yoshiki
Am 28.12.2010 11:19, schrieb Yoshiki Ohshima:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
Very nice! Even nicer would be to represent the minimal image as a structured text file from which the actual image can be created with relatively simple C code. That way, the bootstrap would contain only human-readable constructs and no magic bytes.
Cheers, Hans-Martin
On 12/28/2010 12:43 PM, Hans-Martin Mosner wrote:
Am 28.12.2010 11:19, schrieb Yoshiki Ohshima:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
Very nice! Even nicer would be to represent the minimal image as a structured text file from which the actual image can be created with relatively simple C code. That way, the bootstrap would contain only human-readable constructs and no magic bytes.
Use the simulator, Luke! (err, wait, that's not quite right I think...) In any case, the simulator should be quite capable of producing the object representation from source code.
Cheers, - Andreas
At Tue, 28 Dec 2010 12:43:05 +0100, Hans-Martin Mosner wrote:
Am 28.12.2010 11:19, schrieb Yoshiki Ohshima:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
Very nice! Even nicer would be to represent the minimal image as a structured text file from which the actual image can be created with relatively simple C code. That way, the bootstrap would contain only human-readable constructs and no magic bytes.
Yes. In README, I wrote: "capture the dynamic behavior of MicroSqueakImageBuilder and store the commands...". I was thinking a sequence that is like:
nil <- object id 1 nil class <- Object ID 2. ...
and build up an image.
-- Yoshiki
Hi Yoshiki,
On Tue, Dec 28, 2010 at 2:19 AM, Yoshiki Ohshima yoshiki@vpri.org wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
I simply don't see the benefit of putting energy into other languages. I see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk? If Smalltalk is more productive (which it is) then writing such a bootstrap in C is a waste of effort, reinvents several wheels at considerable expense and produces an artifact that is less flexible, less extensible, less useful than implementing the same functionality in Smalltalk.
On the other hand, as Andreas suggests, trying to implement something using the simulator looks to be really powerful. Recent;y I've been playing tangentally in this area. In recent days I've produced a new code generator for Cog that has some useful speedups (Compiler recompileAll ~ 9% faster, benchFib 2x). To test the code generator I needed to check stack depths at various points in JIT compilation and execution of the JITted code. I have a Smalltalk class StackDepthFinder that answers the stack depths for each bytecode of a method. By adding two classes VMObjectProxy and VMCompiledMethodProxy I could apply StackDepthFinder to methods in the simulator's heap and hence derive stack depths for any method in the simulators image. To test the JIT it was also convenient to be able to JIT methods in my work image, synthesised test cases etc, not just methods in the simulated image. Again a facade class allows the simulator to JIT any method in my work image. This worked well and was easy to implement. Extending in this direction seems straight-forward.
One starts up wit a simulator and an empty heap and bootstraps objects into that heap, using whatever bytecode set and object format one chooses. One can test the image using the simulator which should be quite fast enough if the image is a small kernel. All the implementation is useful and adds to the simulator/VMMaker ecosystem. All the code is Squeak and can reuse substantial parts of the system. Seems like a win to me. I think I'll take this approach in implementing the new object format. It could be a new backend to MicroSqueak. cheers Eliot
Thank you!
-- Yoshiki
While I am on the side of those who prefer to work in Smalltalk than in C, there are several projects which have used the opposite approach and I can understand why they were done that way.
Little Smalltalk from Tim Budd comes (depending on the version) as a set of C files that get compiled to two executables: the virtual machine and the image builder. The latter reads a text file which is in a format that is very easy to edit and generates an image that the virtual machine can use. The Smalltalk parser in C is the main code that the image builder had which isn't also a part of the virtual machine.
http://www.littlesmalltalk.org/
The Self VM is a huge C++ program which also includes the Self-to-bytecodes translator. When the VM starts up without an image it creates an "empty" world with a minimum set of objects and can grow from there by reading source files. There was a translator written in Self as one of the benchmarks, but since it wasn't actually used by the system it became outdated very quickly and is now only of historical interest.
Slate has gone through three rather different implementation, each with its own bootstrapping scheme. There are many good ideas in this system as well:
I am not familiar with the details of GNU Smalltalk, but since it doesn't need an image to run it almost certainly includes a Smalltalk compiler in C -
For those of us who want to do it all in Smalltalk, we have projects like Klein (Self-in-Self - http://kleinvm.sourceforge.net/) and Huemul Smalltalk (uses as many OS libraries as possible and the Squeak Exupery compiler for dealing with Smalltalk code - http://www.guillermomolina.com.ar/index.php/en/projects/huemul-smalltalk ) as examples.
-- Jecel
On 28 December 2010 19:22, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Yoshiki,
On Tue, Dec 28, 2010 at 2:19 AM, Yoshiki Ohshima yoshiki@vpri.org wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
I simply don't see the benefit of putting energy into other languages. I see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk? If Smalltalk is more productive (which it is) then writing such a bootstrap in C is a waste of effort, reinvents several wheels at considerable expense and produces an artifact that is less flexible, less extensible, less useful than implementing the same functionality in Smalltalk. On the other hand, as Andreas suggests, trying to implement something using the simulator looks to be really powerful. Recent;y I've been playing tangentally in this area. In recent days I've produced a new code generator for Cog that has some useful speedups (Compiler recompileAll ~ 9% faster, benchFib 2x). To test the code generator I needed to check stack depths at various points in JIT compilation and execution of the JITted code. I have a Smalltalk class StackDepthFinder that answers the stack depths for each bytecode of a method. By adding two classes VMObjectProxy and VMCompiledMethodProxy I could apply StackDepthFinder to methods in the simulator's heap and hence derive stack depths for any method in the simulators image. To test the JIT it was also convenient to be able to JIT methods in my work image, synthesised test cases etc, not just methods in the simulated image. Again a facade class allows the simulator to JIT any method in my work image. This worked well and was easy to implement. Extending in this direction seems straight-forward. One starts up wit a simulator and an empty heap and bootstraps objects into that heap, using whatever bytecode set and object format one chooses. One can test the image using the simulator which should be quite fast enough if the image is a small kernel. All the implementation is useful and adds to the simulator/VMMaker ecosystem. All the code is Squeak and can reuse substantial parts of the system. Seems like a win to me. I think I'll take this approach in implementing the new object format. It could be a new backend to MicroSqueak.
Likewise, in NativeBoost, to debug a generated native code, i simply instructing assembler to generate int3 instruction at point where i need, and then after accepting a method and running the doit, which using this method, i pop into debugger (like gdb) and can step by step see what it does..
I don't remember being able to do that from any other language, because code-compile-run cycle makes it simply impossible. Ah yes.. there is 'compile and continue' feature for C.. but i never used that after couple failures. And besides often compile & continue takes same time to compile things than simply build everything from scratch.. so at the moment when it done compiling, you can forget what you doing there :)
cheers Eliot
Thank you!
-- Yoshiki
Eliot
I would love to see that during the school at INRIA. Could you have a session on that? Because we need to use a better infrastructure to build new image. BTW: I had the same question regarding the use of the C compiler. What we tried with the students here was to use the system itself but we could not work full time on it and also we started from a larger kernel (and wanted to remove more - big mistake).
Stef
Hi Yoshiki,
On Tue, Dec 28, 2010 at 2:19 AM, Yoshiki Ohshima yoshiki@vpri.org wrote: Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
I simply don't see the benefit of putting energy into other languages. I see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk? If Smalltalk is more productive (which it is) then writing such a bootstrap in C is a waste of effort, reinvents several wheels at considerable expense and produces an artifact that is less flexible, less extensible, less useful than implementing the same functionality in Smalltalk.
On the other hand, as Andreas suggests, trying to implement something using the simulator looks to be really powerful. Recent;y I've been playing tangentally in this area. In recent days I've produced a new code generator for Cog that has some useful speedups (Compiler recompileAll ~ 9% faster, benchFib 2x). To test the code generator I needed to check stack depths at various points in JIT compilation and execution of the JITted code. I have a Smalltalk class StackDepthFinder that answers the stack depths for each bytecode of a method. By adding two classes VMObjectProxy and VMCompiledMethodProxy I could apply StackDepthFinder to methods in the simulator's heap and hence derive stack depths for any method in the simulators image. To test the JIT it was also convenient to be able to JIT methods in my work image, synthesised test cases etc, not just methods in the simulated image. Again a facade class allows the simulator to JIT any method in my work image. This worked well and was easy to implement. Extending in this direction seems straight-forward.
One starts up wit a simulator and an empty heap and bootstraps objects into that heap, using whatever bytecode set and object format one chooses. One can test the image using the simulator which should be quite fast enough if the image is a small kernel. All the implementation is useful and adds to the simulator/VMMaker ecosystem. All the code is Squeak and can reuse substantial parts of the system. Seems like a win to me. I think I'll take this approach in implementing the new object format. It could be a new backend to MicroSqueak. cheers Eliot
Thank you!
-- Yoshiki
On Wed, Dec 29, 2010 at 1:29 AM, stephane ducasse < stephane.ducasse@gmail.com> wrote:
Eliot
I would love to see that during the school at INRIA. Could you have a session on that?
Sure. I'm hoping that I can set some assignments and have people try and build stuff. So this would be a good point to start from and head in the direction of bootstrapping an image.
Because we need to use a better infrastructure to build new image. BTW: I had the same question regarding the use of the C compiler. What we tried with the students here was to use the system itself but we could not work full time on it and also we started from a larger kernel (and wanted to remove more - big mistake).
Stef
Hi Yoshiki,
On Tue, Dec 28, 2010 at 2:19 AM, Yoshiki Ohshima yoshiki@vpri.org
wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
I simply don't see the benefit of putting energy into other languages. I
see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk? If Smalltalk is more productive (which it is) then writing such a bootstrap in C is a waste of effort, reinvents several wheels at considerable expense and produces an artifact that is less flexible, less extensible, less useful than implementing the same functionality in Smalltalk.
On the other hand, as Andreas suggests, trying to implement something
using the simulator looks to be really powerful. Recent;y I've been playing tangentally in this area. In recent days I've produced a new code generator for Cog that has some useful speedups (Compiler recompileAll ~ 9% faster, benchFib 2x). To test the code generator I needed to check stack depths at various points in JIT compilation and execution of the JITted code. I have a Smalltalk class StackDepthFinder that answers the stack depths for each bytecode of a method. By adding two classes VMObjectProxy and VMCompiledMethodProxy I could apply StackDepthFinder to methods in the simulator's heap and hence derive stack depths for any method in the simulators image. To test the JIT it was also convenient to be able to JIT methods in my work image, synthesised test cases etc, not just methods in the simulated image. Again a facade class allows the simulator to JIT any method in my work image. This worked well and was easy to implement. Extending in this direction seems straight-forward.
One starts up wit a simulator and an empty heap and bootstraps objects
into that heap, using whatever bytecode set and object format one chooses. One can test the image using the simulator which should be quite fast enough if the image is a small kernel. All the implementation is useful and adds to the simulator/VMMaker ecosystem. All the code is Squeak and can reuse substantial parts of the system. Seems like a win to me. I think I'll take this approach in implementing the new object format. It could be a new backend to MicroSqueak.
cheers Eliot
Thank you!
-- Yoshiki
At Tue, 28 Dec 2010 10:22:35 -0800, Eliot Miranda wrote:
I simply don't see the benefit of putting energy into other languages. I see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk? If Smalltalk is more productive (which it is) then writing such a bootstrap in C is a waste of effort, reinvents several wheels at considerable expense and produces an artifact that is less flexible, less extensible, less useful than implementing the same functionality in Smalltalk.
Like I wrote in another reply, it is more about the a meta-language (in this case, a PEG or PEG-like generator). In the last S3 conference, Ian and Takashi and myself showed that the same S-expression language can be targeted to x86 code and Adobe Bytecode. Similar to this, a good bootstrapping strategy would be to write the major part of compiler in something like PEG and provide different backend to produce different executables. For my Bootstrap Compiler, it is not C as much as Leg; C is convenient and have a workable parser generator, so it was an okay step, I thought. (Arguably, Leg does not support structual matching of trees; One of the phase in the reference implementation is written directly in C, even though the pattern in the code just resembles to the PEG implementation.
On the other hand, as Andreas suggests, trying to implement something using the simulator looks to be really powerful. Recent;y I've been playing tangentally in this area. In recent days I've produced a new code generator for Cog that has some useful speedups (Compiler recompileAll ~ 9% faster, benchFib 2x). To test the code generator I needed to check stack depths at various points in JIT compilation and execution of the JITted code. I have a Smalltalk class StackDepthFinder that answers the stack depths for each bytecode of a method. By adding two classes VMObjectProxy and VMCompiledMethodProxy I could apply StackDepthFinder to methods in the simulator's heap and hence derive stack depths for any method in the simulators image. To test the JIT it was also convenient to be able to JIT methods in my work image, synthesised test cases etc, not just methods in the simulated image. Again a facade class allows the simulator to JIT any method in my work image. This worked well and was easy to implement. Extending in this direction seems straight-forward.
Ah, okay. Just an analogy, in the way Slang works, the meta-language here happen to be Slang/Smalltalk and different executables are produced with different backend. As you said, however, Slang is not a language implementation but a practical vehicle to get to a place; with a nicer meta-language, the picture could look prettier.
One starts up wit a simulator and an empty heap and bootstraps objects into that heap, using whatever bytecode set and object format one chooses. One can test the image using the simulator which should be quite fast enough if the image is a small kernel. All the implementation is useful and adds to the simulator/VMMaker ecosystem. All the code is Squeak and can reuse substantial parts of the system. Seems like a win to me. I think I'll take this approach in implementing the new object format. It could be a new backend to MicroSqueak. cheers
Eliot
Thank you!
Hi all!
On 12/28/2010 07:22 PM, Eliot Miranda wrote:
I simply don't see the benefit of putting energy into other languages. I see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk?
Well, one benefit would be to fit more easily into the ecosystems of Linux distros etc where they generally get nervous from having to start from an "unknown" binary.
In other words - if a system can be built solely from text files using "stock" tools like a C compiler (or any tool typically available in these ecosystems) then the acceptance is much higher.
A similar example was when I wrote a "module" (=build script) for Lunar Linux which is a source distro much like Gentoo - for the GHC compiler (Haskell). GHC needs an existing GHC to compile itself, so the script had to first download a binary GHC in order to compile the new one.
Not that I think this argument is "worth" it perhaps, but hey, any and all tools/efforts that gives us flexibility is cool in my book. :)
regards, Göran
On 12/30/10, Göran Krampe goran@krampe.se wrote:
Hi all!
On 12/28/2010 07:22 PM, Eliot Miranda wrote:
I simply don't see the benefit of putting energy into other languages. I see the benefit of a textual bootstrap. But why is it worth-while implementing that in C instead of Smalltalk?
Well, one benefit would be to fit more easily into the ecosystems of Linux distros etc where they generally get nervous from having to start from an "unknown" binary.
In other words - if a system can be built solely from text files using "stock" tools like a C compiler (or any tool typically available in these ecosystems) then the acceptance is much higher.
+1 C may be considered as a kind of general assembly language available on most platforms.
However the C program might be generated from a Smalltalk implementation. Thus both points of view may be served.
--Hannes
Hi Yoshiki,
Thanks very much for SqueakBootstrapper. It's a most interesting idea!
I'd like to experiment further with the code you've shared, but I am having trouble getting your SqueakBootstrapper.image to start, and so I hope you don't mind if I ask a few questions, namely:
- Which virtual machine are you using it with?
- How different is the MObject hierarchy inside it from the MObject hierarchy in John's original (current) MicroSqueak? Could I start from John's MObject code and simply add your MCompiler code?
- Alternatively, would you be willing to provide fileouts of the MObject and MicroSqueak code from your dev image that others could file in to theirs?
Again, thanks very much for sharing your work.
Regards, Tony
On 2010-12-28 5:19 AM, Yoshiki Ohshima wrote:
Hi,
We've been playing with John's MicroSqueak and it occured to me that having a bytecode compiler that is implemented outside of Squeak opens some possibilities, such as generate a growable image file from all text files, or make deep changes to the system without shooting yourself.
I wrote a longer explanation so if you are interested, please go to:
https://github.com/yoshikiohshima/SqueakBootstrapper
and check it out.
Thank you!
-- Yoshiki
Hi, Tony,
At Wed, 16 Feb 2011 19:15:39 -0500, Tony Garnock-Jones wrote:
Hi Yoshiki,
Thanks very much for SqueakBootstrapper. It's a most interesting idea!
Hehe, thanks. "Most interesting" is probably overstatement^^;
I'd like to experiment further with the code you've shared, but I am having trouble getting your SqueakBootstrapper.image to start, and so I hope you don't mind if I ask a few questions, namely:
- Which virtual machine are you using it with?
Any pre-Cog VM should work. On Windows. I happened to use 3.11.8.
- How different is the MObject hierarchy inside it from the MObject hierarchy in John's original (current) MicroSqueak? Could I start from John's MObject code and simply add your MCompiler code?
John's version for example didn't have SymbolTable for example. The host image is a weird amalgam of 3.8-based image with some trunk ideas such as method properties so AdditionalMethodState has its counterpart in MObject. It is quite possible to start from John's MObject code and add MCompiler related code. There will be a lot of unimplemented methods, so you need to add these also. You can check them by "MicroSqueak unimplemented", and it should work (mostly).
But there are other differences. Some of the collection classes were adapted from the trunk version and then I had a whisper of Bad Idea Bears to make "Dictionary be identity-based and make a class called "EqualityDictionary" for equality-based dictionary. I still like the idea that the default behavior of a Dictionary is identity-based and with MicroSqueak, I can make that kind of changes without worry about what other Squeaks do... But these changes does not affect the Compiler much but certain places I changed them because of it.
I actually made a similar thing but a bit different; a MicroSqueak-derived image and some files that lets you write OMeta2 grammar in text file and process things from command line. In that one, classes and methods are more closer to John's MicroSqueak. (But to support the compiler the other classes needed some more methods.)
- Alternatively, would you be willing to provide fileouts of the MObject and MicroSqueak code from your dev image that others could file in to theirs?
The image should run. But the iamge is still BlockContext-based, and if you want to use it with the latest trunk image for example, you need fto fix it.
Again, thanks very much for sharing your work.
You're welcome!
-- Yoshiki
squeak-dev@lists.squeakfoundation.org