28
submitted 3 days ago* (last edited 3 days ago) by TrudeauCastroson@hexbear.net to c/opensource@lemmy.ml

I'm looking for something that I can scan hand-written notes into and have OCR'd. Maybe one that I can even train on my handwriting. Ideally I end up with a searchable PDF of my notes.

People use one-note for this, but I'm not really comfortable with letting microsoft see my handwriting.

you are viewing a single comment's thread
view the rest of the comments
[-] mindlight@lemm.ee 2 points 3 days ago

To train an AI to recognize handwriting you need a huge dataset of handwriting examples. That is millions of samples of handwritten text + information about what the written text says in every example).

This is why the best engines only exists as a service in the cloud. The OCR engines you can install lovely that are acceptable, but far from perfect, are commercial. Parascript FormXtra is one of the better commercial ones.

The only OCR Engine that's free and really good is Tesseract OCR but it doesn't handle handwritten text.

[-] interdimensionalmeme@lemmy.ml 2 points 3 days ago

Can you fine tune tesseract on a local hand writing dataset ? Or insert it in context like a pre-prompt ?

[-] mindlight@lemm.ee 3 points 3 days ago

It wasn't possible a year ago when pos6ted around with tesseract. Things might have changed during the last couple of months though.

load more comments (5 replies)
this post was submitted on 28 Jun 2024
28 points (100.0% liked)

Open Source

28943 readers
502 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 4 years ago
MODERATORS