812
submitted 3 weeks ago* (last edited 3 weeks ago) by ordellrb@lemmy.world to c/linuxmemes@lemmy.world
you are viewing a single comment's thread
view the rest of the comments
[-] R00bot@lemmy.blahaj.zone 21 points 3 weeks ago

I can't imagine it'd be that hard to write some code that does that using an existing AI model.

[-] not_amm@lemmy.ml 9 points 3 weeks ago

I found a small command to run KDE Spectacle (screenshot software) with Tesseract so I can OCR a screenshot if I want to, I only had to install Tesseract and a main language, you could easily do the same with an API and/or a local AI.

[-] JackGreenEarth@lemm.ee 5 points 3 weeks ago

You're probably right.

[-] MacNCheezus 3 points 3 weeks ago

Llava and Bakllava are two Ollama models than can not only extract text but also describe what's happening on screen.

Using tesseract-ocr, as the other guy suggested, is probably simpler and less resource intensive though.

this post was submitted on 05 Jun 2024
812 points (98.8% liked)

linuxmemes

19701 readers
274 users here now

I use Arch btw


Sister communities:

Community rules

  1. Follow the site-wide rules and code of conduct
  2. Be civil
  3. Post Linux-related content
  4. No recent reposts

Please report posts and comments that break these rules!

founded 1 year ago
MODERATORS