this post was submitted on 11 Jan 2025
229 points (97.9% liked)

Data is Beautiful

1346 readers
3 users here now

Be respectful

founded 6 months ago
MODERATORS
 

Cross posted from: Latin@lemm.ee

lingua latina pater linguarum dimidum est ๐Ÿ˜Ž

I hope it's okay for me to crosspost here.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Hackworth@lemmy.world 5 points 3 days ago (2 children)

I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.

[โ€“] gandalf_der_12te@discuss.tchncs.de 3 points 3 days ago* (last edited 3 days ago)

i know that my NN internally uses semantic tokenization method.

i literally often seek the word roots when talking to somebody. it helps me focus.

[โ€“] fxomt@lemm.ee 2 points 3 days ago

Interesting paper, thanks for sharing