[squeak-dev] Re: etarsoinl:
marcel.taeumel
Marcel.Taeumel at hpi.de
Wed Jun 22 09:17:22 UTC 2016
Tobias Pape wrote
> Hi all
>
> I was curious about the relative distribution of characters in Squeak
> Code.
> I sampled the source code[1] and drew a histogram (Attached)
> Here are my results:
>
> - The most frequent (printable) characters are in order
>
> etarsoinl:
>
> and more detailed, the 90 most frequent characters:
>
>
> etarsoinl:cdfumhpg.ybwSv"=1CT'x][0F)(k2ANPI|M^B4O7D6R3598#EL-,zWVjU;H+q/>*<G at KX${}YQZJ\~?!
>
> - This is quit close to actual English:
>
> etaonishrlducmwyfgpbvkjxqz
>
> - Observations:
> - The most frequent punctuation is : and . follows quite long after.
> - Cascading is comparatively rare. We have more blocks and
> equality/identity comparisons than ;
> - Blocks are more common than parenthesis and literal arrays
> - You cannot spell ifTrue or ifFalse with the 20 most common characters
> - ifTrue: is far more common than ifFalse:
> - The most frequent uppercase Character is S. I have no conjecture here,
> tho.
>
> - Comparison:
> - Here's C, sampling the Linux kernel:
>
>
> et_risancodlupfm,);(*0hvgb-E=x>ITRSACkNL.P1O/wD2My"{}UF&3GB4q86HV5:<X#[]+zK7W9Y|%\!jQZ'
>
> - under_score_case vs. camelCase is rather obvious.
> - (not displayed but tab and newline are amog the 6 most frequent
> characters!)
> - Punctuation starts much earlier.
> - The beginning differs a lot, the ending not so much.
> - 0 is far more important than 1
> - : is unimportant
>
> - Here's Ruby, sampling Rails:
>
>
> etsaonridl_cupmh.f:,"gb')(=y#vw/kq>ATx0<1R[]@S{}CE|2?-zjDMIPN+BO\F3L5!HU%&4*98GW6;YV7J`X
>
> - underscore shows, but not so much as in C.
> - The : is (like in Smalltalk) more important
> - Uppercase is more uncommon than in both C and Smalltalk.
>
>
> Have fun!
>
> Best regards
> -Tobias
>
>
>
>
> [1]:
> " Uses the new HistogramMorph "
> | characterFrequency |
> CurrentReadOnlySourceFiles cacheDuring: [
> characterFrequency := ((CompiledMethod allInstances select:
> [:method | (method allLiterals detectSum:
> [:lit | lit isCollection ifFalse: [0] ifTrue: [lit size]]) < 1500])
> gather: [:method | method getSource
> reject: [:c |c isSeparator]]) asBag].
>
> (HistogramMorph on: characterFrequency)
> labelBlock: [:c | c codePoint > 32 ifTrue:[c asString] ifFalse: [c
> printString]];
> openInWorld.
>
> ((characterFrequency sortedCounts collect: [:ea | ea value]) first: 90)
> join.
>
>
>
>
>
> etarsoinl.png (9K)
> <http://forum.world.st/attachment/4902372/0/etarsoinl.png>
:)
"Do you think the author might be interested in rewriting his work to cut it
down? If you cut out all the 'O's, you might lose six pages there."
http://www.dailymotion.com/video/x4n10h_mr-mann-bookshop_fun
Best,
Marcel
--
View this message in context: http://forum.world.st/etarsoinl-tp4902372p4902397.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.
More information about the Squeak-dev
mailing list
|