[squeak-dev] Mine-able ideas?

Wed Jan 2 22:32:07 UTC 2013

On 2 January 2013 22:17, Colin Putney <colin at wiresong.com> wrote:
>
>
>
> On Wed, Jan 2, 2013 at 4:18 PM, Frank Shearar <frank.shearar at gmail.com>
> wrote:
>>
>> http://blog.datomic.com/2012/10/codeq.html
>>
>> Executive summary:
>> * Git gives version control over files
>> * Clojure code typically has lots of functions or other chunks of code
>> in one file
>> * This means you can't ask for the version of a single unit of code
>> * Static analyses over the files as they vary through time, dumped
>> into a database, yields interesting stuff
>>
>> What they're calling "codeqs" ("code quantum") filetree folks would
>> call a file, because filetree already splits everything (I think?)
>> into bits, and versions everything at the "codeq" level by virtue of
>> storing each bit in its own file: class definition, comment, method
>> definition, etc.
>>
>> So we already have most of this stuff already - I couldn't live
>> without my in-image method versions - but I'm wondering if anyone else
>> can spot anything worth copying?
>
>
> Nah. They're basically figuring out how to extract the semantic changes from
> git, since git just treats the source code as opaque text. That gets them to
> what Monticello has now. I guess there's a bit of "imagine what you could do
> then!" that's unspecified.

That was pretty much what I was thinking. And filetree preserves this
fine-grained "code quantum"-sized version control.

The only advantage I still see of lots-of-stuff-inna-file is that you
can very quickly hop around a bunch of code. Our tools just don't work
that way. They _could_. Noone's just ever hurt enough to display code
in this fashion. It's easy enough: what's not so easy is to make that
big blob of text efficiently editable such that you still keep track
of the, for example, individual methods. (I'll leave aside the lack of
syntax around method definition. That's not a big problem.) For
instance: parse the entire file, find the method definitions, update
the image by compiling them. (Handwave around the imperative hacks one
could do.)

> Which is not to say that it's a bad idea. I'd love to create a huge database
> of, say, the update stream going back to the beginning, or the entire
> contents of squeaksource. But... then what?
>
> Things that spring to mind immediately:
>
> - universal senders and implementors
> - metrics like message sends per method or methods per class
> - detection of package dependencies

This would be a massive win. I took a bash a while ago at extending
DependencyBrowser to work over one's package-cache to do this. I
didn't get terribly far, probably largely to me being pretty ignorant
about just about everything I needed to know. I have the
half-completed work lying around. Maybe I should publish it somewhere!

frank

> - analysis of how long-lived packages change over time
> - analysis of contribution and collaboration between coders
>
> and so on.
>
> But, what good is it? Might be interesting, maybe there's some research
> papers to be written, but would it do us any good as a community? Would
> there be useful tools that came out of it? Would it be worth the effort?
> Hard to say.
>
> Colin
>
>
>