On Dec 29, 2007 9:23 PM, Joshua Gargus <schwa@fastmail.us> wrote:
If security is the goal, this seems not to be the first place to spend scarce developer time.  

What are the vectors by which an attacker can cause such malicious bytecodes to be executed?  The first three that come to mind are:
- direct access to method dictionaries and/or unrestricted compiler access
- providing malicious input to a system-provided binary code loader
- exploiting bugs in the compiler

If the first attack vector is available, crashes due to malicious bytecodes are the least of your problems; arbitrary code execution is a bigger concern.  Glancing at the SecureSqueak page, it seems like you probably have a plan for this.  Have you already solved this problem?  If not, there's no point in bulletproofing the VM against ill-formed bytecodes.

As I mentioned in response to Mathieu, it seems to me that the second attack vector can mostly be dealt with by load-time inspection.  I'm not intimately familiar with Squeak's bytecodes, but I'd be surprised if there were more than a few where run-time checks are actually required.

The third case assumes that the compiler is restricted in some way (eg: the attacker cannot simply "crash" the system by compiling a method containing "Smalltalk snapshot: false andQuit: true"); instead they have to find a way to write code such that the compiler accidently generates invalid bytecodes.  To provide an extra layer of security, we can always subject the newly-compiled method to the same inspection as we do above when loading binary code.


Thanks for the input, Josh. I'll be starting this thread up again when I'm actually ready to submit changes to the VM or fork it.

My developer time isn't scarce. I have about another 50 years left in me :-).

To provide more information about what I'm doing, I'm loading code remotely (and transparently, using a distributed object architecture) as bytecodes. The literals in CompiledMethods are rebound when the code is loaded. The code itself is stored in Namespaces, so named literals can only refer to a small set of objects that that code has access to.

Remotely loaded code wouldn't usually have access to MethodDictionary-s or CompiledMethods, nor the Compiler. My intention is that code is loaded into a sort of a browser, much like you could load a Project into Squeak now, meaning that code would be from a public source and could be malicious.

Gulik.

--
http://people.squeakfoundation.org/person/mikevdg
http://gulik.pbwiki.com/