John
I have been playing around with bioinformatics and squeak. So far I have a class heirarchy for dna, protein, and coding sequences with lots of useful methods, mostly things like correction formulae, codon usage, base frequencies, etc. In addition, I have a class that parses the Genbank flat file format and can produce sequences from a menu of the documented features. I have used this a bit for my own research, but am now planning to teach a computational genetics course to our biology undergrads using Squeak...so the pressure is on to turn all this into something useful and fun. Right now I can drop sequence morphs onto a playfield and produce things like dot matrix plots and phylogenetic trees, but visually it ain't much. The speed issue hasn't been a problem because I'm not trying to do anything very ambitious. I ported this stuff from my lisp code, where I used calls to C routines (like clustal) to do hard calculations. Perhaps something similar will solve the speed problem with squeak.
If any of this sounds useful, I could package it up in its current state.
John Gillespie jhgillespie@ucdavis.edu
"John Tobler" squeaker@diganet.com wrote:
The last thing I personally need right now is to tangle with a major non-paying project, so I have decided to start on the design and development of a Biosqueak initiative. It rubs me the wrong way to see Bioeveryotherlanguage.org and not to see our beloved Squeak represented. I would appreciate any and all comments, signs of interest, etc. I will probably get rolling with implementing some of the simpler bioinformatics routines, following models already available in Biopython, Bioperl, and Bioruby. Anyone else who is interested in applying Squeak to bioinformatics problems is most welcome to join in.
I am guessing that we will face some formidable obstacles. As Heiko Schaefer pointed out in a recent post, "... little emphasis has been given so far on numerical work with squeak." Will someone please correct me if this assessment is unfair? It looks like some related work is underway by the Numerics group at the Camp Smalltalk connected with the ESUG conference in Essen. Hopefully, something approaching the capability of Numeric Python (NumPy) will magically appear just before we need it for Biosqueak. I am also sure that bioinformatics processing will fully test Squeak's mettle on text searching and pattern recognition. Where do we find support for regular expressions and the like? I anticipate that trying to solve such real world problems as sequence searching, sequence allignment, and protein structure prediction will point out areas where we can improve Squeak's reach and performance. There should be a challenge or two here to keep hardy pioneers and somewhat unstable test pilots interested.
Anyway, I intend to get started. This is just a "heads up" that Biosqueak is out there somewhere on the vast horizon.
More later,
John Tobler squeaker@diganet.com johntobler@earthlink.net