[squeak-dev] Re: [Pharo-project] Project ideas for GSoC, 7 days left!

Hernán Morales Durand hernan.morales at gmail.com
Wed Mar 7 17:24:47 UTC 2012


Name: Big data CSV parser plugin
Level: Intermediate
Possible mentor: ?
Possible second mentor: ?

Description

With the advent of inexpensive DNA microarray technology, big data is now
available to many small and medium laboratories which performs statistical
analysis based in microarray experiments. Most of the times the data
produced by genotyping services is delivered in CSV format, as it
represents a currently cross-platform "standard" which is easily readable,
and still used in hundreds of business applications. In Smalltalk we have
several CSV parsers but the performance is far from being competitive with
libraries implemented in other languages. The goal of this project is to
measure time execution and build a plugin to access CSV data in a fast and
competitive way.

Technical Details
Currently exists several open source projects which implements C functions
to access CSV data. The challenge of this project is to learn tools like
VMMaker and Interpreter Plugin classes to develop a Squeak/Pharo internal
or external plugin.

Benefits to the Student
The student will learn about interfacing highly efficient libraries to
Smalltalk.

Benefits to the Community
The Smalltalk community will gain a winning library for a extremely common
task like dealing with CSV files.


Name: HDF5 support (http://www.hdfgroup.org/HDF5/)
Level: Intermediate
Possible mentor: ?
Possible second mentor: ?

Description

Hierarchical Data File 5 is a new (1998) format capable of storing large
and complex amount of data, and it is used in Gravitational and Plasma
Physics, Earth Science research, Weather Services, Software Engineering,
Biomedical Informatics, etc. As new data adquisition hardware is providing
bigger datasets (for example, sequencing data) the need to query and access
metadata, partial and full datasets in an efficient way (parallel I/O) is
more important. In this format data are stored in a hierarchical format
similar to the UNIX file system, and the data model supports a rich variety
of data types and data space organizations. Currently exists APIs and
wrappers for Java, .NET, Python, C and FORTRAN.

The goal of this project is to build a wrapper to enable to access HDF5
data in Smalltalk. This binding could open Smalltalk to a lot of science
domains and users in which currently pure object technology is unknown.

Technical Details
The student will need to learn details about the HDF format as data sets
and composite data types.

Benefits to the Student
The student would learn about efficient data systems, implement an API, and
experiment with large scientific data in Smalltalk.

Benefits to the Community
The Smalltalk community will attract more users by keeping in touch with
big data analytics, by providing access to an efficient data format used
currently in research and business.

2012/3/5 Janko Mivšek <janko.mivsek at eranova.si>

> Hernán and Karl, can you or someone else develop your ideas a bit
> furher, by answering the questions:
>
>  description
>  technical details
>  benefif for student
>  benefit for community
>
> while potential mentor can be choosen later
>
> Thanks!
> Janko
>
> Dne 02. 03. 2012 21:38, piše karl ramberg:
> > Make web browser plugin of Squeak work better on all platforms.
> > Get Etoys image to run on CogVM.
> >
> > Karl
> >
> >
> > On Fri, Mar 2, 2012 at 9:36 PM, karl ramberg <karlramberg at gmail.com>
> wrote:
> >> Port OpenQwaq video to Etoys
> >>
> >> Karl
> >>
> >>
> >> On Fri, Mar 2, 2012 at 9:17 PM, Hernán Morales Durand
> >> <hernan.morales at gmail.com> wrote:
> >>> To dream is easy:
> >>>
> >>> -HDF5 (http://www.hdfgroup.org/HDF5/) support like PyTables or h5py
> for
> >>> Python
> >>> -Information Retrieval/Full Text Search package like Lucene
> >>> (http://lucene.apache.org/)
> >>> -A binding to R or SAS or SPSS
> >>> -Better or more semantic web support (see
> >>>
> http://en.wikipedia.org/wiki/Web_Services_Resource_Framework#Implementations
> )
> >>> -An omnibrowser for OWL ontologies
> >>> -CSS template system
> >>> -Support of a Distributed Hash Table protocol like Pastry
> >>> (http://www.freepastry.org/)
> >>> -Workflow system with designer and plug-in architecture (see
> >>> http://www.taverna.org.uk/)
> >>> -Big data CSV parser plugin
> >>> -Plugin for fast approximate search in strings
> >>>
> >>> cheers,
> >>>
> >>>
> >>>
> >>> 2012/3/2 Janko Mivšek <janko.mivsek at eranova.si>
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> 7 days to deadline for ideas of this year GSoC! Please think about
> what
> >>>> would be a nice project for students to work and that way join our
> >>>> community. And yes, students are welcome to propose projects too!
> >>>>
> >>>> Let we "recycle" past project ideas too! Please review them and
> propose
> >>>> those still valuable. Or change the proposal to suit better. Here they
> >>>> are: http://gsoc2010.esug.org/ideas .
> >>>>
> >>>> Let me and Carla post GSoC related stuff only on Pharo, Squeak and
> VWNC
> >>>> mailing lists while for other let someone forward those mails there
> >>>> please. It is just too hard to post on 10+ mailing lists, you know...
> >>>>
> >>>> Past GSoC mentors please join the debate on special Smalltalk GSoC
> >>>> mentors list: http://groups.google.com/group/smalltalk-gsoc-mentors.
> If
> >>>> you want to be mentor this year you are welcome to join that list too.
> >>>>
> >>>> Best regards
> >>>> Janko
> >>>>
> >>>> S, Janko Mivšek piše:
> >>>>> Dear Smalltalkers,
> >>>>>
> >>>>> Let we apply this year again for the Google Summer of Code, which as
> you
> >>>>> know is a Google's stipendium program for students to encourage them
> >>>>> working on open-source projects [1].
> >>>>>
> >>>>> Ok, our first step as community is to collect ideas for possible
> >>>>> projects and to apply to the GSoC as an organization. Deadline is
> next
> >>>>> Friday, so please hurry on with ideas. Just put them as reply to this
> >>>>> email or to admins directly and we will put them on our website [2].
> >>>>>
> >>>>> Your project idea should be written as answers to these questions:
> >>>>>
> >>>>>   Name
> >>>>>   Level (Beginner, Intermediate, Advanced)
> >>>>>   Possible mentor
> >>>>>   Possible second mentor
> >>>>>
> >>>>>   Description
> >>>>>   Technical Details
> >>>>>   Benefits to the Student
> >>>>>   Benefits to the Community
> >>>>>
> >>>>>
> >>>>> See how such ideas look like in past: http://gsoc2010.esug.org/ideas
> .
> >>>>>
> >>>>> Waiting for your ideas
> >>>>> Carla and Janko, your GSoC Admin team
> >>>>>
> >>>>> [1] http://www.google-melange.com/gsoc/homepage/google/gsoc2012
> >>>>> [2] http://gsoc2012.esug.org
> >>>>>
> >>> --
> >>> Hernán Morales
> >>> Information Technology Manager,
> >>> Institute of Veterinary Genetics.
> >>> National Scientific and Technical Research Council (CONICET).
> >>> La Plata (1900), Buenos Aires, Argentina.
> >>> Telephone: +54 (0221) 421-1799.
> >>> Internal: 422
> >>> Fax: 425-7980 or 421-1799.
> >>>
> >>>
> >>>
> >>>
> >
> >
>
> --
> Janko Mivšek
> Smalltalk GSoC Admin Team
> http://gsoc2012.esug.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20120307/78d15894/attachment.htm


More information about the Squeak-dev mailing list