1
11

I was considering making a 30+ TB NAS to simplify and streamline my current setup but because it's a relatively low priority for me I am wondering is it worth it to hold off for a year or two?

I am unsure if prices have more or less plateaued and the difference won't be all that substantial. Maybe I should just wait for Black Friday.

For context it seems like two 16TB HDD would cost about $320 currently.


Here's some related links:

  • This article by Our World in Data contains a chart with how the price per GB has decreased overtime.

  • This article by Tom's Hardware talks about how in July 2023 SSD prices bottomed out before climbing back up predicted further increases in 2024.

2
13
Renewed drives (slrpnk.net)
submitted 2 weeks ago by greengnu@slrpnk.net to c/datahoarder@lemmy.ml

Are they worth considering or only worth it at certain price points?

3
151
submitted 2 weeks ago by xnx@slrpnk.net to c/datahoarder@lemmy.ml

cross-posted from: https://slrpnk.net/post/10273849

Vimms Lair is getting removal notices from Nintendo etc. We need someone to help make a rom pack archive can you help?

Vimms lair is starting to remove many roms that are being requested to be removed by Nintendo etc. soon many original roms, hacks, and translations will be lost forever. Can any of you help make archive torrents of roms from vimms lair and cdromance? They have hacks and translations that dont exist elsewhere and will probably be removed soon with ios emulation and retro handhelds bringing so much attention to roms and these sites

4
105

I've been working on this subtitle archive project for some time. It is a Postgres database along with a CLI and API application allowing you to easily extract the subs you want. It is primarily intended for encoders or people with large libraries, but anyone can use it!

PGSub is composed from three dumps:

  • opensubtitles.org.Actually.Open.Edition.2022.07.25
  • Subscene V2 (prior to shutdown)
  • Gnome's Hut of Subs (as of 2024-04)

As such, it is a good resource for films and series up to around 2022.

Some stats (copied from README):

  • Out of 9,503,730 files originally obtained from dumps, 9,500,355 (99.96%) were inserted into the database.
  • Out of the 9,500,355 inserted, 8,389,369 (88.31%) are matched with a film or series.
  • There are 154,737 unique films or series represented, though note the lines get a bit hazy when considering TV movies, specials, and so forth. 133,780 are films, 20,957 are series.
  • 93 languages are represented, with a special '00' language indicating a .mks file with multiple languages present.
  • 55% of matched items have a FPS value present.

Once imported, the recommended way to access it is via the CLI application. The CLI and API can be compiled on Windows and Linux (and maybe Mac), and there also pre-built binaries available.

The database dump is distributed via torrent (if it doesn't work for you, let me know), which you can find in the repo. It is ~243 GiB compressed, and uses a little under 300 GiB of table space once imported.

For a limited time I will devote some resources to bug-fixing the applications, or perhaps adding some small QoL improvements. But, of course, you can always fork them or make or own if they don't suit you.

5
49
submitted 3 weeks ago by ylai@lemmy.ml to c/datahoarder@lemmy.ml
6
22

I'm looking at my library and I'm wondering if I should process some of it to reduce the size of some files.

There are some movies in 720p that are 1.6~1.9GB each. And then there are some at the same resolution but are 2.5GB.
I even have some in 1080p which are just 2GB.
I only have two movies in 4k, one is 3.4GB and the other is 36.2GB (can't really tell the detail difference since I don't have 4k displays)

And then there's an anime I have twice at the same resolution, one set of files are around 669~671MB, the other set 191 each (although in this the quality is kind of noticeable while playing them, as opposed to the other files I extract some frames)

What would you do? what's your target size for movies and series? What bitrate do you go for in which codec?

Not sure if it's kind of blasphemy in here talking about trying to compromise quality for size, hehe, but I don't know where to ask this. I was planning on using these settings in ffmpeg, what do you think?
I tried it in an anime at 1080p, from 670MB to 570MB, and I wasn't able to tell the difference in quality extracting a frame form the input and the output.
ffmpeg -y -threads 4 -init_hw_device cuda=cu:0 -filter_hw_device cu -hwaccel cuda -i './01.mp4' -c:v h264_nvenc -preset:v p7 -profile:v main -level:v 4.0 -vf "hwupload_cuda,scale_cuda=format=yuv420p" -rc:v vbr -cq:v 26 -rc-lookahead:v 32 -b:v 0

7
11
submitted 1 month ago by ylai@lemmy.ml to c/datahoarder@lemmy.ml
8
43

I was so confident that WhatsApp was backing itself up to Google ever since I got my new pixel but I just wasn't. Then yesterday I factory reset my phone to fix something else and I lost it all. Years worth of chats from so many times in my past just aren't there, all my texts with my mom and my family, group chats with old friends... I can't even look at the app anymore, I'll never use Whatsapp as much as I used to. I just don't feel right with this change. There's no way to get those chats back and now it doesn't feel like there's any point backing up WhatsApp now! I really wanna cry like this is so unfair!! And all I had to do was check Whatsapp before I did a factory reset.. the TINIEST THING I could have done and prevented this and I didn't fucking do it!!!!!!!

How do I get past this?

9
24
submitted 1 month ago* (last edited 1 month ago) by ylai@lemmy.ml to c/datahoarder@lemmy.ml
10
29
submitted 1 month ago* (last edited 1 month ago) by cm0002@lemmy.world to c/datahoarder@lemmy.ml

With Google Workspace cracking down on storage (Been using them for unlimited storage for years now) I was lucky to get a limit of 300TBs, but now I have to actually watch what gets stored lol

A good portion is uh "Linux ISOs", but the rest is very seldom (In many cases last access was years ago) accessed files that I think would be perfect for tape archival. Things like byte-to-byte drive images and old backups. I figure these would be a good candidate for tape and estimate this portion would be about 100TBs or more

But I've never done tape before, so I'm looking for some purchasing advice and such. I seen from some of my research that I should target picking up an LTO8 drive as it's compatible with LTO9 for when they come down in price.

And then it spiraled from there with discussions on library tape drives that are cheaper but need modifications and all sorts of things

11
7
submitted 1 month ago* (last edited 1 month ago) by dullbananas@lemmy.ca to c/datahoarder@lemmy.ml

Run this javascript code with the document open in the browser: https://codeberg.org/dullbananas/google-docs-revisions-downloader/src/branch/main/googleDocsRevisionDownloader.js

Usually this is possible by pasting it into the Console tab in developer tools. If running javascript is not an option, then use this method: https://lemmy.ca/post/21276143

You might need to manually remove the characters before the first { in the downloaded file.

12
6
submitted 1 month ago* (last edited 1 month ago) by dullbananas@lemmy.ca to c/datahoarder@lemmy.ml
  1. Copy the document ID. For example, if the URL is https://docs.google.com/document/d/16Asz8elLzwppfEhuBWg6-Ckw-Xtfgmh6JixYrKZa8Uw/edit, then the ID is 16Asz8elLzwppfEhuBWg6-Ckw-Xtfgmh6JixYrKZa8Uw.
  2. Open this URL: https://docs.google.com/document/u/1/d/poop/revisions/load?id=poop&start=1&end=1 (replace poop with the ID from the previous step). You should see a json file.
  3. Add 0 to the end of the number after end= and refresh. Repeat until you see an error page instead of a json file.
  4. Find the highest number that makes a json file instead of an error page appear. This involves repeatedly trying a number between the highest number known to result in a json file and the lowest number known to result in an error page.
  5. Download the json file. You might need to remove the characters before the first {.

I found the URL format for step 2 here:

https://features.jsomers.net/how-i-reverse-engineered-google-docs/

I am working on an easy way. Edit: here it is https://lemmy.ca/post/21281709

13
71
submitted 1 month ago by ylai@lemmy.ml to c/datahoarder@lemmy.ml
14
36
submitted 1 month ago by lars@lemmy.sdf.org to c/datahoarder@lemmy.ml

cross-posted from: https://programming.dev/post/13631943

Firefox Power User Keeps 7,400+ Browser Tabs Open for 2 Years

15
21

cross-posted from: https://leminal.space/post/6179210

I have a collection of about ~110 4K Blu-Ray movies that I've ripped and I want to take the time to compress and store them for use on a future Jellyfin server.

I know some very basics about ffmpeg and general codec information, but I have a very specific set of goals in mind I'm hoping someone could point me in the right direction with:

  1. Smaller file size (obviously)
  2. Image quality good enough that I cannot spot the difference, even on a high-end TV or projector
  3. Preserved audio
  4. Preserved HDR metadata

In a perfect world, I would love to be able to convert the proprietary HDR into an open standard, and the Dolby Atmos audio into an open standard, but a good compromise is this.

Assuming that I have the hardware necessary to do the initial encoding, and my server will be powerful enough for transcoding in that format, any tips or pointers?

16
20

So it's been a a few years since I've bought hard drives for my little home server and wanted to get a bead on what's the target on dollar to TB in the post Covid world. Thanks!

17
14
submitted 2 months ago by ylai@lemmy.ml to c/datahoarder@lemmy.ml
18
-1
submitted 2 months ago by velox_vulnus@lemmy.ml to c/datahoarder@lemmy.ml

I'm looking for EPUB formats of all the three books.

19
12
submitted 2 months ago* (last edited 2 months ago) by crony@lemmy.cronyakatsuki.xyz to c/datahoarder@lemmy.ml

Hello, I'm wondering what do you guys use and recommend for efficient book, comic, manga and lightnovel file management, tagging, directory structures and automated tools for all that.

My collection is mostly made from humble bundle book bundles, for getting tags into comics I use comictagger and as for file structure, it was mostly just me just putting something to separate the books.

I wan't to hear you guys input because most of you are a lot more efficient or have a lot more experience in saving big ammounts of data, and I wan't to make my process as painless and future proof as possible as my collection starts to grow.

Edit: I use linux so software like comicrack which I heard a lot about isn't really accessible to me. The files also need to be accessible to my kavita server.

20
26
submitted 2 months ago by Clinico@lemmy.eco.br to c/datahoarder@lemmy.ml

How to store digital files for posterity? (hundreds of years)

I have some family videos and audios and I want to physically save them for posterity so that it lasts for periods like 200 years and more. This allows great-grandchildren and great-great-grandchildren to have access.

From the research I did, I found that the longest-lasting way to physically store digital content is through CD-R gold discs, but it may only last 100 years. From what I researched, the average lifespan of HDs and SSDs is no more than 10 years.

I came to the conclusion that the only way to ensure that the files really pass from generation to generation is to record them on CDs and distribute them to the family, asking them to make copies from time to time.

It's crazy to think that if there were suddenly a mass extinction of the human species, intelligent beings arriving on Earth in 1000 years would probably not be able to access our digital content. While cave paintings would probably remain in the same place.

What is your opinion?

21
44
submitted 2 months ago* (last edited 2 months ago) by xnx@slrpnk.net to c/datahoarder@lemmy.ml

I formatted my PC recently and I'm reinstalling some stuff I forgot to make a backup of settings like yt-dlp so I started searching for what is a good config to download the best mp4 quality and found some interesting setups so I figured I'd make a thread for people to share what they use.

Here's the most best setup I found so far which downloads a 1080p mp4 with the filename and includes metadata, english subtitles, and chapters if available: yt-dlp -f 'bestvideo[height<=1080][ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' -S vcodec:h264 --windows-filenames --restrict-filenames --write-auto-subs --sub-lang "en.*" --embed-subs --add-metadata --add-chapters --no-playlist -N 4 -ci --verbose --remux-video "mp4/mkv" URL

Ideally it would also mark the sponserblock section and download to a specified folder

22
48
submitted 2 months ago by tim@wants.coffee to c/datahoarder@lemmy.ml

@datahoarder The Internet Archive Just Backed Up an Entire Caribbean Island
https://www.wired.com/story/internet-archive-backed-up-aruba-caribbean-island/

23
11
submitted 2 months ago by syaochan@feddit.it to c/datahoarder@lemmy.ml

Hi, anyone could point me at an ELI5 about SAS hardware? I'd like to assemble a NAS using an old HP Z200, I want SAS because I'd get also a tape drive for backups and I cannot find SATA tape drives. For example, is a Dell Perc H310 pci-e card good for me? Can I avoid hardware RAID?

24
37
submitted 2 months ago by Baku@aussie.zone to c/datahoarder@lemmy.ml

While clicking through some random Lemmy instances, I found one that's due to be shut down in about a week — https://dmv.social. I'm trying to archive what I can onto the Wayback Machine, but I'm not sure what the most efficient way to go about it is.

At the moment, what I've been doing is going through each community and archiving each sort type (except the ones under a month, since the instance was locked a month ago) with capture outlinks enabled. But is there a more efficient way to do it? I know of the Internet Archives save from spreadsheet tool, which would probably work well, but I don't know how I'd go about crawling all the links into a sitemap or csv or something similar. I don't have the know-how to setup a web crawler/spider.

Any suggestions?

25
12

Seems the SSD sometimes heats up and the content disappears from the device, mostly from my router, sometimes from my laptop.
Do you know what I should configure to put the drive to sleep or something similar to reduce the heat?

I'm starting up my datahoarder journey now that I replaced my internal nvme SSD.

It's just a 500GB one which I attached to my d-link router running openwrt. I configured it with samba and everything worked fine when I finished the setup. I just have some media files in there, so I read the data from jellyfin.

After a few days the content disappears, it's not a connection problem from the shared drive, since I ssh into the router and the files aren't shown.
I need to physically remove the drive and connect it again.
When I do this I notice the somewhat hot. Not scalding, just hot.

I also tried this connecting it directly to my laptop running ubuntu. In there the drive sometimes remains cool and the data shows up without issue after days.
But sometimes it also heats up and the data disappears (this was even when the data was not being used, i.e. I didn't configure jellyfin to read from the drive)

I'm not sure how I can be sure to let the ssd sleep for periods of time or to throttle it so it can cool off.
Any suggestion?

view more: next ›

datahoarder

6272 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS