Token-based source compression
Andrew C. Greenberg
werdna at gate.net
Mon Aug 16 00:32:53 UTC 1999
>I like the idea to compress source files by using tokenizing the most often
>used words as mentioned in this list early on. Therefore, I compiled the
>following list of the most used tokens.
Since you are escaping, you can use escape much more than 20 tokens.
Moreover, I think it would be best to select candidates for escaping
not only based upon the frequency of use, but rather the frequency of
use TIMES the length; that is, with an eye to maximizing the total
savings. A short identifier of four characters used twice as often
as a long identifier of twenty characters is not as good a candidate
as the long identifier.
More information about the Squeak-dev
mailing list
|