this post was submitted on 03 Oct 2023
4 points (83.3% liked)

Hacker News

3871 readers
3 users here now

This community serves to share top posts on Hacker News with the wider fediverse.

Rules0. Keep it legal

  1. Keep it civil and SFW
  2. Keep it safe for members of marginalised groups

founded 1 year ago
MODERATORS
 

There is a discussion on Hacker News, but feel free to comment here as well.

you are viewing a single comment's thread
view the rest of the comments
[–] lvxferre@lemmy.ml 2 points 1 year ago* (last edited 1 year ago)

I might be wrong, but from what I've noticed* LLMs handle translation without relying on external tools.

The text in question was printed, not calligraphy, and it's rather recent (the substack author mentions it to be from the 18th century). It was likely handled through OCR, the typeface is rather similar to a modern Italic one, with some caveats (long ʃ, Italic ampersand, weird shape of the tilde). I don't know if ChatGPT4 handles this natively, but note that the shape of most letters is by no means archaic.

In this specific case it doesn't matter much if it was trained on text following ABL ("Brazilian") or ACL ("European") standards, since the text precedes both anyway, and the spelling of both modern standards is considerably more similar to each other than with what was used back then (see: observaçam→observação, huma→uma, he→é). What might be relevant however is the register that the model was trained on, given that formal written Portuguese is highly conservative, although to be honest I have no idea.

*note: this is based on a really informal test that I did with Bard, inputting a few prompts in Venetian. It was actually able to parse them, to my surprise, even if most translation tools don't support the language.