datahoarder

1

31

We now have official presence here! (the-eye.eu)

submitted 2 years ago by archivist@lemmy.ml to c/datahoarder@lemmy.ml

10 comments fedilink

@ray@lemmy.ml Got it done, I'm first of the mods here and will be learning a little Lemmy over the next few weeks.

While everything is up in the air with the reddit changes I'll be very busy working on replacing the historical pushshift API without reddits bastardizations should a PS version come back.

In the mean time you should all mirror this data ensuring its survival, do what you do best and HOARD!!

https://the-eye.eu/redarcs/

2

50

The Garage team - An open-source distributed object storage service tailored for self-hosting (garagehq.deuxfleurs.fr)

submitted 1 week ago by BaconWrappedEnigma@lemmy.nz to c/datahoarder@lemmy.ml

7 comments fedilink

Anyone used this successfully in their setup?

Garage is an S3-compatible distributed object storage service designed for self-hosting at a small-to-medium scale.

Garage is designed for storage clusters composed of nodes running at different physical locations, in order to easily provide a storage service that replicates data at these different locations and stays available even when some servers are unreachable. Garage also focuses on being lightweight, easy to operate, and highly resilient to machine failures.

Garage is built by Deuxfleurs, an experimental small-scale self hosted service provider, which has been using it in production since its first release in 2020.

3

5

Where to buy movies/TV shows? (sh.itjust.works)

submitted 1 week ago by chickenf622@sh.itjust.works to c/datahoarder@lemmy.ml

4 comments fedilink

Looking to build a collection that I just outright own, so any streaming platform that doesn't allow me to download the raw files is a no go. Other than the big players (Amazon, Walmart, etc.) what are some good sources for buying?

4

19

[Question] Quietest 16-18TB hard drive for NAS (lemmy.world)

submitted 2 weeks ago by impudentmortal@lemmy.world to c/datahoarder@lemmy.ml

19 comments fedilink

Looking to upgrade my NAS hard drives. Currently have two 4TB WD Red Plus hard drives but I wanted to get some large capacity drives. Was looking into getting 16 or 18TB drives. My current drives are basically whisper quiet and have been running great since 2019 but I feel like it's time to upgrade the capacity.

The NAS is currently on a desk beside my computer. I don't have any cabinets to place it in and would prefer not to connect to it through Wi-Fi. Hence why I'd like for the drives to be as quiet as possible.

I was considering getting a Seagate Exos or Ironwolf (and buying used for the great price) but I've read users online saying they regret buying those models because of their noise. I was also looking at the WD Red Pro but WD's own website only rates them at 3.6/5 with most of the negative complaints about dead on arrival drives. Additionally 25% of all reviews are 1 star; both of which don't fill me with much confidence.

TLDR: What's a quiet and reliable hard drive recommendation for a NAS?

Would it be better just to go with the WD Red Plus at a lower capacity?

5

27

Budget friendly 4tb nvme for a canadian cheapass trying to make own cloud with a geekworm pi nas? (lemmy.ml)

submitted 2 weeks ago by The0utc4st@sh.itjust.works to c/datahoarder@lemmy.ml

13 comments fedilink

Okay, so now that my little experiment with a bunch of scam nvme drive from amazon is done and over with and I got my money back from amazon. Where do I look for some cheap and semi decent 4tb nvmes? Adata used to by my goto budget flash memory, never had any problems with any of their drives. But they're not so inexpensive anymore... Team group seems like they have good prices but how reliable are they?

Is prime day or boxing day even a good time to buy drives?

Is there any 4tb nvme under $300(CAD) even worth looking at?

Again, I'm just farting around and experimenting but any suggestion will be greatly appreciated and win you imaginary internet points from a stranger sitting on a porcelain thrown as he writes this.

6

10

VHS digitization woes (sh.itjust.works)

submitted 3 weeks ago by Kalcifer@sh.itjust.works to c/datahoarder@lemmy.ml

7 comments fedilink

I'm trying to digitize some VHS tapes (presumably recorded as NTSC), but I have some questions that I've yet to find answers for. My current process/setup is as follows:

VHS tapes are played in a PV-D4745S-K VCR
The VCR's composite output is captured using a generic EasyCap capture card.
The captured output is fed into OBS Studio with the following settings:
- A source with it's device set to the capture card, the video format is set to YUYV 4:2:2, the resolution is set to 720x480, the frame rate is currently set to Leave Unchanged (more on this later).
- Under Settings>Video I have set Common FPS Values with 29.97.
- I also have set my encoding options under Settings>Output, as well as audio settings under Settings>Audio, but the details of that aren't relevant in this context.
- I also have deinterlacing disabled by right clicking on the scene and selecting Deinterlacing>Disable.

With this, I seem to be able to capture VHS tapes with decent quality, but I have some nagging questions:

How do I verify if OBS has indeed captured interlaced? I'm trying to capture both fields, but I'm unsure if that's actually happening, and I'm not sure how to go about verifying it.
Should I capture at 29.97 FPS or 59.94 FPS? My thinking is that, given that I'm capturing interlaced, I would think I would multiply the number of captured frames by 2 as, if I understand correctly, each captured frame contains 2 fields, and each would be captured sequentially, so if I want to capture at 29.97 FPS interlaced, I would need to capture at 59.94 FPS. I'm not sure if I'm right about that though.
I mentioned above that the framerate under the source properties is set to Leave Unchanged. The reason for why I chose that option is because the only other options that it offers for framerates are 30.00, 20.00, 10.00, and 5.00 — ie there is no option for 29.97, nor 59.94 — so I'm using Leave Unchanged in the hopes that it's autodetecting the proper frame rate, but that's mostly an assumption on my part. The closest to NTSC's 29.97 would be 30.00, but I'm not sure if this is an issue. And what's confusing me more is that I have 29.97 FPS set under Settings>Video with Common FPS Values and 29.97 set. If I set to source framerate to 30 with OBS at 29.97, will that lead to syncing issues? Is there a way to force the source to use 29.97 to match OBS? What's confusing me further about this is that if I list the formats for the capture device with
```
v4l2-ctl --device=/dev/video2 --list-formats-ext
```
I get the following output (I have truncated it to only list what's relevant, as the full output is long and contains unnecessary information):
```
[…]
[0]: 'YUYV' (YUYV 4:2:2)
    size: Discrete 720x480
[…]
     Interval: Discrete 0.033s (30.000 fps)
     Interval: Discrete 0.050s (20.000 fps)
     Interval: Discrete 0.100s (10.000 fps)
     Interval: Discrete 0.200s (5.000 fps)
[…]
```
There is no option for 29.97 FPS, and, as can be seen by the output, it matches what OBS sees. Is this an issue? It seems, to me, that the capture card isn't capable of proper NTSC framerates, and can only capture at 30 FPS as the closest value.

7

46

Update of my $65 4tb nvmes 🤣🤪 (lemmy.ml)

submitted 3 weeks ago* (last edited 3 weeks ago) by The0utc4st@sh.itjust.works to c/datahoarder@lemmy.ml

21 comments fedilink

So i'm testing one of the drives I got on amazon using an old computer. It started off promising, didn't get any errors when formating the drive, write speed using a usb adapter at 25mb/s... Then it dropped off to 7mb/s and the expected time shot up from 40hours to 156h🤣

I'm going to let it run over night and see what happens in the morning. If I can get it to show 4tb without error I might still keep them to test out my geekworm pi nas, otherwise back to amazon you go!

8

65

Someone lost 20 years worth of Pokemon when trying to transfer data from Switch 1 to Switch 2 (web.archive.org)

submitted 3 weeks ago by Internet@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

22 comments fedilink

I've never transferred Pokemon between gens and I've never used Pokemon Home, but it seems wild to me to be so invested into such a fickle storage system. Thoughts and prayers for the guy affected

9

20

Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. (github.com)

submitted 1 month ago by cyrano@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

10 comments fedilink

I stumbled upon that new use of mp4 format. Interesting.

10

61

Just wanted to share my yt-dlp alias (programming.dev)

submitted 1 month ago by muhyb@programming.dev to c/datahoarder@lemmy.ml

17 comments fedilink

I'm sure some of you already using it like this but if not, this could be useful for you.

It creates a directory with the channel's name, create sub-directories with the playlist name, it gives them a number and put them in an order, it can continue to download if you have to cancel it midway.

You can modify it to your needs.

Add this to your ~/.bashrc or your favourite shell config.

alias yt='yt-dlp --yes-playlist --no-overwrites --download-archive ~/Downloads/yt-dlp/archive.txt -f "bestvideo[height<=1080]+bestaudio/best[height<=1080]" -o "~/Downloads/yt-dlp/%(uploader)s/%(playlist_title,single_playlist)s/%(playlist_index,00)s - %(title)s - [%(id)s].%(ext)s"'

You can even limit the download speed by adding this parameter: --limit-rate 640K This example is for 5 Mb/s.

11

10

How do I bulk download edX courses? (sh.itjust.works)

submitted 1 month ago by Mapie@sh.itjust.works to c/datahoarder@lemmy.ml

8 comments fedilink

Harvard made available some politics courses entirely for free available on edX which made me nosy to check out other courses in the platform and I'm finding plenty of very interesting courses in there, is there a way to bulk download the courses?

12

17

Just bought 5 of these for my pie nas (lemmy.ml)

submitted 1 month ago by The0utc4st@sh.itjust.works to c/datahoarder@lemmy.ml

44 comments fedilink

Scouring through amazon for random stuff yesterday, saw these "generic brand" nvmes for $65 a pop. Figured I's give it a shot for my little geekworm pie nas. 4 for the raid and 1 for backup if something goes boogers up. 20tb for $325 was too good to pass up, worse case scenario they are either 1tb each or they fail after a few months. We'll see whats up when they get here in 2 weeks.

13

8

Digitizing VHS- Where to start? (feddit.rocks)

submitted 2 months ago by Sandpaper1046@feddit.rocks to c/datahoarder@lemmy.ml

3 comments fedilink

Hi, I'm having a bit of information overload. I'm trying to digitize my VHS collection as I know they'll stop working one day, and I'd like to keep their contents around after their analog form has parished. Ive tried startpaging this issue, but I'm honestly overwhelmed. The biggest take-away I have is avoid cheap USB capture devices (EasyCaps and generic variants.) What is the best option for an American on a fixed budget who wants to digitize their collection in the highest quality possible? Ive seen people recommending: TV Tuner Cards RF Video capture via MISRC and Domescape? i think were the names Using a combo DVD/VHS player to copy the VHS to DVD and then ripping the DVD with MakeMKV or handbrake Probably more, I can't recall at the moment but, I'm sure someone will.

Basically, if I dont have to spend money, I would prefer not to, but I do want a quality recording/rip. I know for a fact anything over $200 USD is going to be out if my budget. I have sufficient Linux knowledge and soldering knowledge and I have seen some people those are needed for the RF tools, but I also see some of those options are pricy.

Lemmings, what are your opinions on my situation? Is there anything short of an easycap in my budget that I should be looking into? What are your experiences? Thanks.

14

8

How reliable is SnapRAID? (programming.dev)

submitted 2 months ago by cx40@programming.dev to c/datahoarder@lemmy.ml

1 comments fedilink

SnapRAID doesn't compute the parity in real time, so there's this window between making a change to the data and syncing where your data isn't protected. The docs say

Here’s an example, you acquire a file and save it to disk called ‘BestMovieEver.mkv’. This file sits on disk and is immediately available as usual but until you run the parity sync the file is unprotected. This means if in between your download and a parity sync and you were to experience a drive failure, that file would be unrecoverable.

Which implies that the only data at risk is the data that's been changed, but that doesn't line up with my understanding of how parity works.

Say we have three disks that store 1 bit of information and a parity drive: 101 parity 0. If we modify the data in the first disk (data 001 parity 0), then the data is out of sync. Say we now lose disk 2 (data 0?1 parity 0). How does it then recover that data? We're in an inconsistent state where the remaining data tells us that drive 2 used to hold 0^1^0=1 when it actually held a 0. So doesn't that mean that between any modifications and a sync operation, all your data in that disk region is now at risk? Does SnapRAID do anything special to handle this?

15

30

First steps on data hoarding? (programming.dev)

submitted 3 months ago by dirtycrow@programming.dev to c/datahoarder@lemmy.ml

19 comments fedilink

I'm thinking of backing all of my family's digital assets up. It includes less than 4 TB of information. Most are redundant video files that are in old encodings or not encoded at all and there are a lot of duplicate images and old documents. I'm gonna clean this stuff up with a bash script and some good old manual review, but first I need to do some pre-planning.

What's the cheapest and most flexible NAS I can make from eBay or local? What kind of processors and what motherboard features?
What separate guides should I follow to source the drives? What RAID?
What backup style should I follow? How many cold copies? How do I even handle the event of a fire?

I intend to do some of this research on my own since no one answer is fully representative but am appreciative of any leads.

16

7

Digitizing and archiving old dvd collection (lemmy.dbzer0.com)

submitted 3 months ago* (last edited 3 months ago) by ezyryder@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

6 comments fedilink

My partner's grandmother has passed and has left a collection of hundreds possibly thousands of DVDs. These range from official releases to pirated and bootleg copies.

What would be the best way to digitize and archive this collection? Is there an external device out there that will let me burn and convert the DVDs? I'd want to possibly upload on archive.org if the copyright expired, store on backblaze or maybe another digital archiving site besides a regular torrent, would appreciate any recs on sites and advice in general. I haven't gone through these yet but figure the project would be a fun learning experience.

17

27

Recommendations for an inexpensive DIY backup? (sh.itjust.works)

submitted 3 months ago by beastlykings@sh.itjust.works to c/datahoarder@lemmy.ml

39 comments fedilink

Hi there, I've been meaning to go get more serious about my data. I have minimal backups, and some stuff is not backed up at all. I'm begging for disaster.

Here's what I've got: 2 8tb drives almost full in universal external enclosures A small formfactor PC as a server, with one 8tb drive connected. An unused raspberry pi. No knowledge of how to properly use zfs.

Here's what I want: I've decided I don't need raid. I don't want the extra cost of drives or electricity, and I don't need uptime. I just need backups. I want to use what drives I have, and an additional 16tb drive I'll buy.

My thought was that I would replace the 8tb drive with a 16tb one, format it with zfs (primarily to avoid bit rot. I'll need to learn how to check for this), then back it up across the two 8tb drives as a cold backup. Either as two separate drives somehow? Btrfs volume extension? Or a jbod connected to the raspberry pi, that I leave unplugged except for when it's time to sync the new data?

Or do you have a similarly cheap solution that's less janky?

I just want to back up my data, with an amount of rot protection, cheaply.

I understand that it might make sense to invest in something a bit more robust right now, and fill it with drives as needed.

But the thing I keep coming to is the cold backup. How can you keep cold backups over several hard drives, without an entire second server to do the work?

Thanks for listening to my rambling.

18

4

SS Blog [New Archival Project] (tracker.archiveteam.org)

submitted 3 months ago* (last edited 3 months ago) by archivist@lemm.ee to c/datahoarder@lemmy.ml

0 comments fedilink

cross-posted from: https://lemm.ee/post/60023388

Archive Team has just begun the distributed archiving of the Japanese SS Blog, a blog hosting service, which is set to be discontinued on March 31, 2025.

And you can help! There isn't much time left, so as many people running the warrior as possible is needed.

Resources:

The wiki page of the project (not much info)

The tracker (at the top of the page) has the simplest info on how you can help out

The github page offers a docker-based alternative for advanced users, and more info on best practices for this sort of archiving

Why help out?

The web is disappearing all the time, and often a lot of previously easily accessible information is lost to time. These japanese blogs may not be very important to you, but they certainly are to a lot of people, and nobody knows what sort of information is found only here, until they need it.

19

14

SNES Mods and Romhacks Collection March 26, 2025 (850 pre-patched + description) (archive.org)

submitted 3 months ago by thingsiplay@beehaw.org to c/datahoarder@lemmy.ml

0 comments fedilink

Faster downloads from Gofile, in case Internet Archive is slow or not available: https://gofile.io/d/EFyn1q

Internet Archive for preservation: https://archive.org/details/snes_mods_and_romhacks_collection_20250326_patched

This is the first time I am uploading patched Roms, unlike previously where I uploaded only the patch files. My personal collection of Super Nintendo Romhacks in ready to play patched Roms in .sfc and .smc formats, complete with a descriptive text document. Most, if not all, files are patched by myself, but I did not test every game yet. Some old Romhacks do not work in accurate emulators.

Please share this everywhere where Rom files are allowed to be shared. I am only sharing here at the moment.

This collection comes in two variants: flat structure, and sub structure. "flat" just means all Roms and documents are saved in one single directory. "sub" means, every game got its own dedicated directory, where only related Romhacks and Mods are saved.

snes_mods_and_romhacks_collection_20250326_patched_flat.7z: (View Contents)

     snes_mods_and_romhacks_collection_20250326/
        Super Metroid_Nature v1.03.smc
        Super Metroid_Nature v1.03.txt

snes_mods_and_romhacks_collection_20250326_patched_sub.7z: (View Contents)

        Super Nintendo Mods and Romhacks Collection 2025-03-26/
            Documents/
                Super Metroid/
                    Nature v1.03.txt
            Games/
                Super Metroid/
                    Nature v1.03.smc

20

0

Best web archiving software for complex sites and sites requiring logins? (lemmy.world)

submitted 3 months ago by TheTwelveYearOld@lemmy.world to c/datahoarder@lemmy.ml

3 comments fedilink

For years I've on and off looked for web archiving software that can capture most sites, including ones that are "complex" with lots of AJAX and require logins like Reddit. Which ones have worked best for you?

Ideally I want one that can be started up programatically or via command line, an opens a chromium instance (or any browser), and captures everything shown on the page. I could also open the instance myself and log into sites and install addons like UBlock Origin. (btw, archiveweb.page must be started manually).

21

16

MiniITX for NAS (lemmy.nowsci.com)

submitted 3 months ago by fmstrat@lemmy.nowsci.com to c/datahoarder@lemmy.ml

20 comments fedilink

Hi all,

I've been thinking about picking up an N150 or 5825U MiniITX board for a NAS, but I'm wondering if there are better options given my requirements.

At least 2x 2.5Gb LAN
A 10Gb LAN, or 2.5Gb if not
2x NVME
8x SATA for spinning disks
2x SATA for SSDs
MiniITX is required for the 10" rack
64+ Gigs of RAM (ZFS cache) (This is not possible on an N150)

The problem I'm running into with the boards I've looked at is PCIe lanes, and not having ways to expand the sata or network ports without stealing from NVME.

I've started to look at boards with PCIe 4.0x16 slots and risers/splitters for expansion, but then I can't find low power CPUs for them.

Thoughts?

22

30

The Volunteer Data Hoarders Resisting Trump’s Purge (www.newyorker.com)

submitted 4 months ago* (last edited 4 months ago) by CowsLookLikeMaps@sh.itjust.works to c/datahoarder@lemmy.ml

1 comments fedilink

23

50

yt-dlp - YouTube DRM added on all videos with tv (TVHTML5) client (github.com)

submitted 4 months ago by cantankerous_cashew@lemmy.world to c/datahoarder@lemmy.ml

0 comments fedilink

24

87

Archivists Recreate Pre-Trump CDC Website, Are Hosting It in Europe (www.404media.co)

submitted 4 months ago by balder1991@lemmy.world to c/datahoarder@lemmy.ml

2 comments fedilink

cross-posted from: https://lemmy.world/post/26375626

A team of volunteer archivists has recreated the Centers for Disease Control website exactly as it was the day Donald Trump was inaugurated. The site, called RestoredCDC.org, went live Tuesday and is currently being hosted in Europe.

As we have been following since the beginning of Trump’s second term, websites across the entire federal government have been altered and taken offline under this administration’s war on science, health, and diversity, equity, and inclusion. Critical information promoting vaccines, HIV care, reproductive health options including abortion, and trans and gender confirmation healthcare have been purged from the CDC’s live website under Trump. Disease surveillance data about bird flu and other concerns have either been delayed or have stopped being updated entirely. Some deleted pages across the government have at least temporarily been restored thanks to a court order, but the Trump administration has added a note rejecting “gender ideology” to some of them.

“Our goal is to provide a resource that includes the information and data previously available,” the team wrote. “We are committed to providing the previously available webpages and data, from before the potential tampering occurred. Our approach is to be as transparent as possible about our process. We plan to gather archival data and then remove CDC logos and branding, using GitHub to host our code to create the site.”

25

5

Looking for an archive of the superhero workouts from darebee (lemmy.dbzer0.com)

submitted 4 months ago by nsrxn@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

0 comments fedilink

they made cool workout posters, and still do, but I think they got dmca'd in 2016. the superheroes are all gone.

navigating archive. org is slow and often leads to "no hotlinking" and unavailable Google drive PDFs.

anyone got these stocked somewhere?