this post was submitted on 06 Dec 2023
14 points (88.9% liked)

Sysadmin

7679 readers
55 users here now

A community dedicated to the profession of IT Systems Administration

No generic Lemmy issue posts please! Posts about Lemmy belong in one of these communities:
!lemmy@lemmy.ml
!lemmyworld@lemmy.world
!lemmy_support@lemmy.ml
!support@lemmy.world

founded 1 year ago
MODERATORS
 

Hello everyone.

I haven't had any need for OCR software in probably 15 years, but I have a client who has 7 document boxes worth of forms filled out by hand that they need digitized. They're scanning them into PDFs this week, but want to recover FirstName, LastName, Phone, Email and then a hand written feed back box and load those all into a database.

ChatGPT recommended ABBYY, but it looks like it might be overkill for a one time need like this.

I told them that a couple teenagers doing data entry might be more accurate and cheaper. IDK if that's really true though. I'm not at all an expert on OCR software.

Does anyone have any suggestions?

top 8 comments
sorted by: hot top controversial new old
[–] ikidd@lemmy.world 5 points 11 months ago (1 children)

Thanks for this suggestion, we're going to test it and see how it performs.

[–] DABDA@lemmy.world 4 points 11 months ago (1 children)

Thanks for this suggestion. We're going to test it and see how it performs.

[–] naevaTheRat@lemmy.dbzer0.com 4 points 11 months ago* (last edited 11 months ago)

Depending on the quality of the scan a quick python script using tesseract might be enough. Probs examples online

the handwriting will be full of errors but spell check and an editing pass should be fine? I imagine you'll have to do that anyway

[–] warmaster@lemmy.world 2 points 11 months ago* (last edited 11 months ago)

This for a standalone desktop PC

https://www.openpaper.work/en/

Paperless-NGX for a painless server setup

https://docs.paperless-ngx.com/

Mayan for enterprise server setup

https://www.mayan-edms.com/

[–] redcalcium@lemmy.institute 2 points 11 months ago* (last edited 11 months ago)

Maybe try Marker, which uses AI stuff as part of their OCR process.

Stirling also support OCR using OCRmyPDF.

You can also use GPT4-V (e.g. via this library) to perform the OCR. It'll cost money though.

[–] e_t_@kbin.pithyphrase.net 1 points 11 months ago

My company has gotten good results from Amazon Textract.