[squeak-dev] Re: etarsoinl:

marcel.taeumel Marcel.Taeumel at hpi.de
Wed Jun 22 09:17:22 UTC 2016


Tobias Pape wrote
> Hi all
> 
> I was curious about the relative distribution of characters in Squeak
> Code.
> I sampled the source code[1] and drew a histogram (Attached)
> Here are my results:
> 
> - The most frequent (printable) characters are in order
> 
> 	etarsoinl:
> 
>   and more detailed, the 90 most frequent characters:
> 
> 
> etarsoinl:cdfumhpg.ybwSv"=1CT'x][0F)(k2ANPI|M^B4O7D6R3598#EL-,zWVjU;H+q/>*<G at KX${}YQZJ\~?!
> 
> - This is quit close to actual English:
> 
> 	etaonishrlducmwyfgpbvkjxqz
> 
> - Observations:
>   - The most frequent punctuation is : and . follows quite long after.
>   - Cascading is comparatively rare. We have more blocks and
> equality/identity comparisons than ;
>   - Blocks are more common than parenthesis and literal arrays
>   - You cannot spell ifTrue or ifFalse with the 20 most common characters
>   - ifTrue: is far more common than ifFalse:
>   - The most frequent uppercase Character is S. I have no conjecture here,
> tho.
> 	
> - Comparison:
>   - Here's C, sampling the Linux kernel:
>     
> 
> et_risancodlupfm,);(*0hvgb-E=x>ITRSACkNL.P1O/wD2My"{}UF&3GB4q86HV5:<X#[]+zK7W9Y|%\!jQZ'
> 
>     - under_score_case vs. camelCase is rather obvious.
>     - (not displayed but tab and newline are amog the 6 most frequent
> characters!)
>     - Punctuation starts much earlier.
>     - The beginning differs a lot, the ending not so much.
>     - 0 is far more important than 1
>     - : is unimportant
> 
>   - Here's Ruby, sampling Rails:
> 
> 
> etsaonridl_cupmh.f:,&quot;gb')(=y#vw/kq&gt;ATx0<1R[]@S{}CE|2?-zjDMIPN+BO\F3L5!HU%&4*98GW6;YV7J`X
> 
>     - underscore shows, but not so much as in C.
>     - The : is (like in Smalltalk) more important
>     - Uppercase is more uncommon than in both C and Smalltalk.
> 
> 
> Have fun!
> 
> Best regards
> 	-Tobias
> 
> 
> 
> 
> [1]: 
> " Uses the new HistogramMorph "
> | characterFrequency |
> CurrentReadOnlySourceFiles cacheDuring: [
> 	characterFrequency := ((CompiledMethod allInstances select: 
> 		[:method | (method allLiterals detectSum: 
> 			[:lit | lit isCollection ifFalse: [0] ifTrue: [lit size]]) < 1500])
> 		gather: [:method | method getSource
> 			reject: [:c |c isSeparator]]) asBag].
> 
> (HistogramMorph on: characterFrequency)
> 	labelBlock: [:c | c codePoint > 32 ifTrue:[c asString] ifFalse: [c
> printString]];
> 	openInWorld.
> 	
> ((characterFrequency sortedCounts collect: [:ea | ea value]) first: 90)
> join.
> 
> 
> 
> 
> 
> etarsoinl.png (9K)
> &lt;http://forum.world.st/attachment/4902372/0/etarsoinl.png&gt;

:)

"Do you think the author might be interested in rewriting his work to cut it
down? If you cut out all the 'O's, you might lose six pages there."
http://www.dailymotion.com/video/x4n10h_mr-mann-bookshop_fun

Best,
Marcel



--
View this message in context: http://forum.world.st/etarsoinl-tp4902372p4902397.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.


More information about the Squeak-dev mailing list