Mildly Interesting

21819 readers

235 users here now

This is for strictly mildly interesting material. If it's too interesting, it doesn't belong. If it's not interesting, it doesn't belong.

This is obviously an objective criteria, so the mods are always right. Or maybe mildly right? Ahh.. what do we know?

Just post some stuff and don't spam.

founded 2 years ago

MODERATORS

marcar@lemmy.world

WhatsHerBucket@lemmy.world

Jimbabwe@lemmy.ml

816

At the Internet Archive, this is how we digitize a book—one page at a time, by hand. (files.catbox.moe)

submitted 1 year ago by AnActOfCreation@programming.dev to c/mildlyinteresting@lemmy.world

95 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Wistful@discuss.tchncs.de 30 points 1 year ago (3 children)

Wow that seems painfully slow/tedious. Why isn't it automatized? I think I saw a robot do like 20 pages a second on a yt some years ago.

[–] aeronmelon@lemmy.world 47 points 1 year ago (1 children)

Do you remember the results of those speed scans? Crooked pages, parts of the document cut off, blurry scans, etc.

It was a lazy method that resulted in a lot of junk data.

[–] Wistful@discuss.tchncs.de 21 points 1 year ago (1 children)

I think this is what I saw. Not quite 20 pages/s hahah and also a different method.

[–] PipedLinkBot@feddit.rocks 4 points 1 year ago

Here is an alternative Piped link(s):

this

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] Dave@lemmy.nz 15 points 1 year ago (1 children)

Google have digitised a lot of books using some more advanced tech, though they started out with something a little like this.

[–] cashews_best_nut@lemmy.world 4 points 1 year ago (1 children)

What happened to that in the end? I heard they wanted to digitize the worlds books and then it just petered out at some point and heard nothing about it. Did they continue or was it spun to Internet Archive to do?

[–] Dave@lemmy.nz 3 points 1 year ago

My understanding is the project led into Google Books. Google fought many legal cases and ultimately won but their enthusiasm to scan more books seems to have waned. Google basically convinced judges that by only letting people see a few pages, it fell under fair use, but then that meant you didn't get a giant library because you couldn't read the whole book.

There's an article about it here: https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books

Also see https://www.hathitrust.org/about/ which is mentioned in the article.

[–] prenatal_confusion@lemmy.one 8 points 1 year ago* (last edited 1 year ago)

That would be interesting to see!

This is probably the method that gives you the best quality (deskewing, lighting) without cutting the back of the book and feeding it into a scanner. (AFAIK)

I saw a book scanner similar to this one that used a vacuum to turn pages but otherwise same principle.