this post was submitted on 03 Oct 2023
1831 points (97.7% liked)
Technology
59629 readers
3105 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
On the contrary, tons of books have been digitized from hard copies through a combination of OCR and manual editing. (E.g.: Project Gutenberg.) The same basic process works for both printed books and pages displayed on an e-reader. It's quite tedious but not exactly difficult. Anyone with a smartphone can submit usable scans, though some simple DIY equipment speeds up the process and improves the quality, and OCR is getting better all the time.
In the worst case the book can simply be retyped. People used to copy books by hand after all, using nothing more sophisticated than pen/quill and paper/parchment/papyrus. Unlike in those days the manual effort is only needed once per title, not per copy.
I'm aware of the digitization projects, but not many people have automatic OCR machines in their basement, and manually flipping the page on your scanner is a little impractical for anything more than a 10 page pamphlet 😅
There's no PRACTICAL way to digitize hardcopies for the average Pirate willing to break copyright law
The average person would just download it. Only one needs the equipment to digitize it. And that equipment isn't as specialized as you seem to think. For printed (mass-produced) books you can just cut the pages from the spine and feed them in batches through an automated document feeder, which comes standard with many consumer-grade scanners. Automated page-turning on an e-reader can be done with a software plugin in some cases, or externally with something like a SwitchBot. Capturing copy-restricted video is frankly much more involved, and that hasn't stopped anyone so far.
I mean, I'd really have to disagree, but that's fine.
The effort involved with deconstructing a book, batching it through a document scanner, and compiling it with OCR in a EBOOK-compatible format is not trivial. Most consumer-quality OCR software isn't even that great at recognizing words, new lines, symbols, and hyphenated and line-broken words, let alone recognizing chapters, indexes, footnotes, ect. It's just not something that would be worthwhile for what it produces in the end, and there are millions more print titles than there are movie and show titles.
On the other hand, with A/V there's almost always a way to pass playback through a virtual media capture device. Worst-case you have to wait the real run-time in order to capture it, but at the end you at least have a near-original quality file.
If tomorrow all EBOOKs got locked down without a means to strip DRM, I don't think anyone outside of historical archivists would start spending their time manually cataloguing copyrighted hard copy books to distribute freely. Best-case, only the highest-demanded books would justify that amount of effort, and certainly not enough books to sustain a digital library worth frequenting.
Historically speaking, people have gone to the trouble of manually digitizing hard copy books to distribute freely. There were digital copies of print books available online (if you knew where to look) before e-books were officially available for sale in any form. That includes mass-market novels as well as items of interest to historians. Ergo, your scepticism seems entirely unjustified.
OCR is far from perfect (though editing OCR output is generally faster than retyping), but even without it we have the storage and bandwidth these days to distribute full books as stacks of images if needed, without converting them to text. The same way people distribute scans of comics/manga.