[Newbies] Re: conceptual design help

Fri Apr 29 15:59:53 UTC 2016

Hi Joseph,

I'm making some data visualizations and despite of not having an advice 
on conceptual design, I share part of the practical problem of having to 
work with CSV values in a Smalltalk environment and some times with a 
lot of records (my recent project works with 270k of them). The 
visualization I did was documented broadly at [1], but essentially I 
create a "PublishedMedInfo class >> loadDataFromCSV: aFile 
usingDelimiter: aCharacter" method that fill out my domain objects that 
came from an excel (and then CSV) file.

[1] http://mutabit.com/offray/blog/en/entry/sdv-infomed

For my recent project [2] I'm using a SQLite bridge between Pharo and 
the imported data from CVS. In that way I'm delegating storage and 
querying (including duplicates) to a small but potent database back-end, 
while using objects to model "higher" concerns of my domain. I know some 
worries about objects-database mismatch impedance, but working with data 
and its visualization/reporting lets you to build bridges leveraging the 
former to the database and the last to objects, while using the 
strengths of each one in their own place.

[2] https://twitter.com/offrayLC/status/725314838696701957

So my practical advice is to explore this kinds of combination early in 
your design. May be a quick hands on mockup could let you know if it 
works for you. In my case it has and I'm implementing it sooner in my 
projects.

Cheers,

Offray

Ps: Long time without writing, but I have been reading constantly. Nice 
to be "back" :-)

On 29/04/16 09:28, Joseph Alotta wrote:
> Thanks for all the help.
>
> I like the idea of having the code sense the format of the data and 
> acting accordingly.
>
> For separators, I could count the number of each kind of separators in 
> the file and compare it to the number of lines.  Say 3 or more 
> separators per line.
>
> Then I can parse by columns and look for the dominant data type.  For 
> a column that is 60% matching a date type, I can assume it is a date 
> column and the mismatches are headers.
>
> The amount should be numeric.
>
> The payee should be mostly letters, etc.
>
> One issue I have is knowing what to call the object that does this. 
>  It would not be a Transaction, because this is a function of many 
> Transactions.
>
> FileLoader?  FileAnalyzer?
>
> Also, at this point I should be looking for missing dates and duplicates.
>
> Duplicates are troublesome, since everytime I download the file, it 
> starts from the beginning of the year again.  I keep downloading them 
> because I think they will only keep data for 6 months or so.
>
> Also duplicate transactions are valid.  Suppose I go into a coffee 
> shop and buy a cup of coffee, then go back the same day, same store 
> for a refill.
>
> Your thoughts?
>
> Sincerely,
>
> Joe.
>
>
>
> ------------------------------------------------------------------------
> View this message in context: Re: conceptual design help 
> <http://forum.world.st/conceptual-design-help-tp4892763p4892966.html>
> Sent from the Squeak - Beginners mailing list archive 
> <http://forum.world.st/Squeak-Beginners-f107673.html> at Nabble.com.
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/beginners

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/beginners/attachments/20160429/d3a94956/attachment.htm