28
Open source handwriting OCR?
(hexbear.net)
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Community icon from opensource.org, but we are not affiliated with them.
To train an AI to recognize handwriting you need a huge dataset of handwriting examples. That is millions of samples of handwritten text + information about what the written text says in every example).
This is why the best engines only exists as a service in the cloud. The OCR engines you can install lovely that are acceptable, but far from perfect, are commercial. Parascript FormXtra is one of the better commercial ones.
The only OCR Engine that's free and really good is Tesseract OCR but it doesn't handle handwritten text.
Can you fine tune tesseract on a local hand writing dataset ? Or insert it in context like a pre-prompt ?
It wasn't possible a year ago when pos6ted around with tesseract. Things might have changed during the last couple of months though.
I found the following It migth be possible and affordable
https://konfuzio.com/en/tesseract/
https://github.com/Matleo/Tesseract_fine_tuning_training
https://groups.google.com/g/tesseract-ocr/c/ZLOZpW1fD6I/m/B1Ponc0VBAAJ
https://arcruz0.github.io/posts/finetuning-tess/
None of that made Tesseract excel in capturing handwritten text...