[squeak-dev] Ask objects to group themselves by similar meanings of words....

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Fri Apr 3 19:41:57 UTC 2020


So if I understand you correctly, your question is not actually related to Squeak/Smalltalk at all but rather to the general problem of comparing English vocables by semantic? Off-topic, but still an interesting topic :)


I can only give you a few rough keywords, maybe one of them can help you, and maybe you were already ten steps ahead of me :-)


If you only care about similarity by letters, the simplest solution might be something like calculating the Longest Common Prefix of two strings and comparing the result with a threshold. (That term is googlable :)) However, this won't help you with pairs such as "acentric - acrocentric" unless you use some kind of fuzzy matching.


If you actually care about the semantic similarity, one approach could be a gigantic dictionary of synonyms. I'm sure there are any relevant databases on the web.

The problem with synonyms is that they can compare words only dually. But are "centrifugal" and "centripetal" actually synonyms? It totally depends on the perspective. Maybe you won't be happy with this approach.

A more sophisticated approach is word embeddings. The rough idea is to map each vocable to a large vector in which each component quantifies how related the vocable is to a specific topic. There's a lot of research around this field ...


PS: What are you trying to do with these results, eventually? :-)


Best,

Christoph

________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von gettimothy via Squeak-dev <squeak-dev at lists.squeakfoundation.org>
Gesendet: Freitag, 3. April 2020 21:00:57
An: squeak-dev
Betreff: [squeak-dev] Ask objects to group themselves by similar meanings of words....

Hi folks

I have extracted the various Greek and Latin Roots from https://en.wikipedia.org/wiki/List_of_Greek_and_Latin_roots_in_English/A–G<https://en.wikipedia.org/wiki/List_of_Greek_and_Latin_roots_in_English/A%E2%80%93G> to Squeak objects.
The objects correlate to one row in the various tables at the link.

For example, I have one object for:

<tr>
<td><b>abac-</b><sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></td>
<td>slab</td>
<td>Greek</td>
<td><span lang="grc"><a href="https://en.wiktionary.org/wiki/%E1%BC%84%CE%B2%CE%B1%CE%BE#Ancient_Greek" class="extiw" title="wikt:ἄβαξ">ἄβαξ, ἄβακος</a></span> (<span title="Ancient Greek transliteration" lang="grc-Latn"><i>ábax, ábakos</i></span>), <span lang="grc"><a href="https://en.wiktionary.org/wiki/%E1%BC%80%CE%B2%CE%B1%CE%BA%CE%AF%CF%83%CE%BA%CE%BF%CF%82#Ancient_Greek" class="extiw" title="wikt:ἀβακίσκος">ἀβακίσκος</a></span> (<span title="Ancient Greek transliteration" lang="grc-Latn"><i>abakískos</i></span>)</td>
<td>abaciscus, <a href="/wiki/Abacus" title="Abacus">abacus</a>, <a href="/wiki/Abax" class="mw-redirect" title="Abax">abax</a>
</td></tr>


the cells are put into accessors..corresponding to the headers of the table:

Root, Meaning, Origin, Etymology, English examples.

MyObject
      root -> abac
      meaning -> slab
      language -> greek
      etymology -> blah
       examples -> more-blah


Focusing on "english examples" I am interested in

LatinRoots select:[:each | each english_examples  "have same or similar meanings"]

If anybody has pointers to projects that have grappled with that problem I would appreciate a link.

answers like "Your question is completely nonsensical" are ok, too (:

thanks for your time.





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200403/0385717a/attachment.html>


More information about the Squeak-dev mailing list