[squeak-dev] [ANN] BioSqueak 0.4

Germán Arduino garduino at gmail.com
Fri Feb 1 16:35:14 UTC 2013


Wow, very very interesting Hernán.



2013/2/1 Hernán Morales Durand <hernan.morales at gmail.com>:
>
> Hello Hannes,
> Thanks for the feedback! Some answers then between the lines:
>
> El 01/02/2013 11:35, H. Hirzel escribió:
>
>> Hello Hernán
>>
>> This is interesting.
>> http://biosmalltalk.blogspot.com/
>>
>> I understand that you have constructed an internal domain specific
>> language (a DSL, a query language) for dealing with genetic data in
>> Smalltalk
>>
>> search := BioNCBIWWWBlastClient new nucleotide query:
>> 'CCCTCAAACAT...TTTGAGGAG';
>>     hitListSize: 150;
>>     filterLowComplexity;
>>     expectValue: 10;
>>     wordSize: 11;
>>     blastn;
>>     blastPlainService;
>>     alignmentViewFlatQueryAnchored;
>>     formatTypeXML;
>>     fetch.
>> search outputToFile: 'blast-query-result.xml' contents: search result.
>>
>> Is there a description of this DSL?
>
>
> Is not a DSL in the traditional sense, i.e., using ANTLR, Lex or Yacc, but a
> "DSL" which is embedded thus inheriting the syntax and execution semantics
> of Smalltalk.
> To clarify: I've not built a DSL specification for the QBlast API, although
> I'm willing to develop DSLs for bioinformatics APIs in a Smalltalk language
> workbench (anyone?).
>
> Currently the messages for performing alignments at the NCBI are based in
> the API specification,
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node9.html . The unary
> sends are the result of a plan to reduce parametrization and to replicate or
> customize Blast settings through a UI. This is because geneticists
> experiment changing Blast parameters over time and I want my system not to
> be tied to textual parameters.
>
>
>> The data is kept in XML files and
>> all is read into the image to be queried. It seems that you don't have
>> a problem with the image size?
>
> Yes I had problems with image size and performance, a lot indeed. Actually
> working with XML DOM with alignments of 5000 or more hits Squeak (and Pharo
> of course) started to show slowliness. So I cannot keep all XML nodes in
> memory. To overcome this problem I've tried the SAX (push) parser and the
> XMLPullParser (which is a StAX parser). Then my idea was to reduce the tree
> by specifying only the XML nodes which I'm interested for. After reducing
> the nodes, I wrote custom XML tree classes with a specific API to query
> blast XML results, taken form the DTD specification. AFAIK this is known as
> a XML digester, which is somewhat "evolved" in Java
> (http://commons.apache.org/digester/xmlrules.html). I have built a dynamic
> query builder in Morphic for querying the XML providing the possibility of
> persist and update the filters. Unfortunately for Squeak users I'm using the
> Polymorph API, which I think is not available in Squeak.
>
> We worked using the XML push/pull parsers for reading genomes and they
> worked acceptably. But it is impossible to keep nodes for 3 GBytes of XML at
> least for now in Squeak/Pharo.
>
> More and critical problems arise when trying to work with microarray data
> (big data) in Smalltalk which is not document-oriented. I had to switch to
> "solutions" like SQL, or HDF5 using Pytables with well-designed scheme for
> our input. The advantages are that supports indexing and reading data in
> blocks, besides tools like Vitables or HDFView to navigate the data. Until
> someone provides some bits in this field, there is little opportunity for
> using Smalltalk.
>
>
>> I would welcome a short writeup with a general introduction to what
>> you are doing in http://biosmalltalk.blogspot.com/.
>>
>> Or pointers to papers (Castilian is fine)
>>
>
> We have submitted a paper recently and we are waiting for the review
> results. On the other side we are preparing another paper for a
> phylogenetics decision support system which includes text-mining and a rule
> engine. I will try to write an entry in the next week with screenshots.
>
> Best regards,
>
> Hernán
>
>
>> Kind regards
>>
>> Hannes Hirzel
>>
>> On 2/1/13, Hernán Morales Durand <hernan.morales at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Few days ago I created a port of BioSmalltalk for Squeak too.
>>> BioSmalltalk is a library for doing Bioinformatics with Smalltalk. This
>>> port is labelled "BioSqueak" and I expect to release a version for
>>> Windows sometime soon. You can find it in:
>>>
>>> http://code.google.com/p/biosmalltalk/downloads/list
>>>
>>> I'm very interested in feedback.
>>> Thanks for reading.
>>>
>>> Hernán
>>>
>>> --
>>> Hernán Morales
>>> Institute of Veterinary Genetics (IGEVET)
>>> http://igevet.fcv.unlp.edu.ar
>>> National Scientific and Technical Research Council (CONICET).
>>> La Plata (1900), Buenos Aires, Argentina.
>>> Telephone: +54 (0221) 421-1799.
>>> Internal: 422
>>> Fax: 425-7980 or 421-1799.
>>>
>>>
>>
>>
>
>


More information about the Squeak-dev mailing list