[squeak-dev] Why doesn't ECompletion / OCompletion use a Trie?

Juan Vuletich juan at jvuletich.org
Wed Jun 15 11:40:05 UTC 2011


Hi Levente,

Thanks for your answer. I thought I had missed something, as I think a 
trie would do a much better job. Your answer tells me I had already got 
it, and I still believe a trie is better.
(comments inline)

Levente Uzonyi wrote:
> On Tue, 14 Jun 2011, Juan Vuletich wrote:
>
>> Hi Folks,
>>
>> I've been playing a bit with ECompletion / OCompletion on Cuis 
>> (thanks, Levente!). What surprises me is the use of a single level 
>> Dictionary with 2 character prefixes as the data structure to hold 
>> the string dictionary. Does anybody know why a Trie ( 
>> http://en.wikipedia.org/wiki/Trie ) wasn't used?
>
> OCompletion suggests the recently used methods, therefore the length 
> of the list for each prefix is limited to 40 by default. So looking up 
> selectors beginning with a two letter prefix is fast enough (I 
> increased it to 400 in my image and it's still fast enough). 40 is not 
> always sufficient, sometimes the method you're looking for is not in 
> the list of the three recently used methods, but it works for most cases.

Having the list limited to 40 might be ok when you have a 2 or 3 letter 
prefix, but the majority of the methods will never be suggested, even if 
you keep typing a longer prefix. At some moment, the list will be empty, 
and maybe you still need to finish typing a rather long (seldom used, 
hard to remember) selector. When trying OCompletion, I found this very 
annoying. For instance, in Pharo 1.2, the selectors table holds less 
than 6,000 selectors, while the system has over 32,000 distinct 
selectors. This means that over 80% of the selectors will never be 
suggested.

Showing the most popular selectors first is quite nice. But having a 
hard, hidden boundary on what will be suggested and what not is not that 
nice.

> With a trie you'd have to sort the words matching the prefix by access 
> time for every lookup which may be less efficient, especially for 
> small prefixes. A trie would also have to hold all possible selectors, 
> while the lists limited to 40 avoid this.

The trie will get faster as the prefixes are longer, beating easily the 
table, that will have to filter a much larger set of words. Besides, the 
sorting of the matching results of the trie will not be too bad, if I do 
filtering and sorting at the same time, keeping (for example) just the 
40 latests words. (Note to Juan: do it!).

And as I said before, holding all possible selectors is much better than 
holding (maybe) 20% of them.

Besides, the trie is better for natural language dictionaries, where the 
sorting by date feature is most likely not good and the set of words can 
get much larger. My current implementation can also match words 
regardless of case and accents, and suggest a preferred spelling (for 
example, with correct accents).

Not to mention that code is much simpler, smaller and general...

> Btw, I'm thinking about integrating code completion into Squeak 
> itself. The idea is to add hooks to the system for code completion and 
> provide an implementation for common things, like menus, key bindings, 
> etc. There would also be a simple symbol table based completion tool 
> integrated by default, like Cmd+Q, but with menus, like in Pharo. Then 
> stripped down versions of E/OCompletion could be loaded as a 
> replacement. I'm also thinking about implementing a mix of ECompletion 
> and OCompletion which would show the recently used selectors first 
> like OCompletion, but filters/sorts the suggestions based on type 
> information like ECompletion.

Sounds great. Maybe my implementation can be one of the alternatives.

Cheers,
Juan Vuletich

> Levente




More information about the Squeak-dev mailing list