Token-based source compression

Andrew C. Greenberg werdna at gate.net
Mon Aug 16 00:32:53 UTC 1999


>I like the idea to compress source files by using tokenizing the most often
>used words as mentioned in this list early on.  Therefore, I compiled the
>following list of the most used tokens.

Since you are escaping, you can use escape much more than 20 tokens. 
Moreover, I think it would be best to select candidates for escaping 
not only based upon the frequency of use, but rather the frequency of 
use TIMES the length; that is, with an eye to maximizing the total 
savings.  A short identifier of four characters used twice as often 
as a long identifier of twenty characters is not as good a candidate 
as the long identifier.





More information about the Squeak-dev mailing list