[squeak-dev] Ask objects to group themselves by similar meanings of        words....

gettimothy gettimothy at zoho.com
Fri Apr 3 20:08:19 UTC 2020


Hi Christoph

Thanks for the reply.

The "embeddings" is the "interesting idea" I was looking for. I was sort of hoping Squeak already had something like that. (:


What I am doing is teaching myself Latin.  See. Write. Hear. Say. via a Seaside App.

The imported objects will take care of the "See" part.
I can probably get them to import , or link to how they sound via Wiktionary.

I would like objects I imported to organize themselves in different ways. Alphabetic order is the obvious one and the one the wikipedia pages use.

I think having these things "cluster about meaning/concept" and "cluster about sound" would be a useful pedagogical  approach.


For example spherical things: Ball, Sun, Planet, potato, 
Colors: 
animals
Big things
Small things..



When I import the medical roots: https://en.wikipedia.org/wiki/List_of_medical_roots,_suffixes_and_prefixes
and say you are studying the skeletal system in anatomy, stuff clustered around bones.


Thanks for your reply.


---- On Fri, 03 Apr 2020 15:41:57 -0400 Thiede, Christoph <mailto:Christoph.Thiede at student.hpi.uni-potsdam.de> wrote ----


So if I understand you correctly, your question is not actually related to Squeak/Smalltalk at all but rather to the general problem of comparing English vocables by semantic? Off-topic, but still an interesting topic :)



I can only give you a few rough keywords, maybe one of them can help you, and maybe you were already ten steps ahead of me :-)



If you only care about similarity by letters, the simplest solution might be something like calculating the Longest Common Prefix of two strings and comparing the result with a threshold. (That term is googlable :)) However, this won't help you with pairs
 such as "acentric - acrocentric" unless you use some kind of fuzzy matching.



If you actually care about the semantic similarity, one approach could be a gigantic dictionary of synonyms. I'm sure there are any relevant databases on the web.

The problem with synonyms is that they can compare words only dually. But are "centrifugal" and "centripetal" actually synonyms? It  totally depends on the perspective. Maybe you won't be happy with this approach.

A more sophisticated approach is word embeddings. The rough idea is to map each vocable to a large vector in which each component quantifies how related the vocable is to a specific topic. There's a lot of research around this field ...



PS: What are you trying to do with these results, eventually? :-)



Best,

Christoph








Von: Squeak-dev <mailto:squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von gettimothy via Squeak-dev <mailto:squeak-dev at lists.squeakfoundation.org>
 Gesendet: Freitag, 3. April 2020 21:00:57
 An: squeak-dev
 Betreff: [squeak-dev] Ask objects to group themselves by similar meanings of words....  


Hi folks



I have extracted the various Greek and Latin Roots from https://en.wikipedia.org/wiki/List_of_Greek_and_Latin_roots_in_English/A%E2%80%93G to Squeak
 objects.

The objects correlate to one row in the various tables at the link.



For example, I have one object for:



<tr>

<td><b>abac-</b><sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></td>

<td>slab</td>

<td>Greek</td>

<td><span lang="grc"><a href="https://en.wiktionary.org/wiki/%E1%BC%84%CE%B2%CE%B1%CE%BE#Ancient_Greek" class="extiw" title="wikt:ἄβαξ">ἄβαξ, ἄβακος</a></span>
 (<span title="Ancient Greek transliteration" lang="grc-Latn"><i>ábax, ábakos</i></span>), <span lang="grc"><a href="https://en.wiktionary.org/wiki/%E1%BC%80%CE%B2%CE%B1%CE%BA%CE%AF%CF%83%CE%BA%CE%BF%CF%82#Ancient_Greek"
 class="extiw" title="wikt:ἀβακίσκος">ἀβακίσκος</a></span> (<span title="Ancient Greek transliteration" lang="grc-Latn"><i>abakískos</i></span>)</td>

<td>abaciscus, <a href="/wiki/Abacus" title="Abacus">abacus</a>, <a href="/wiki/Abax" class="mw-redirect" title="Abax">abax</a>

</td></tr>







the cells are put into accessors..corresponding to the headers of the table:



Root, Meaning, Origin, Etymology, English examples.



MyObject

      root -> abac

      meaning -> slab

      language -> greek

      etymology -> blah

       examples -> more-blah





Focusing on "english examples" I am interested in 



LatinRoots select:[:each | each english_examples  "have same or similar meanings"]




If anybody has pointers to projects that have grappled with that problem I would appreciate a link.



answers like "Your question is completely nonsensical" are ok, too (:



thanks for your time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200403/cd15aa7d/attachment.html>


More information about the Squeak-dev mailing list