this post was submitted on 27 Nov 2023
1 points (100.0% liked)
Homelab
371 readers
9 users here now
Rules
- Be Civil.
- Post about your homelab, discussion of your homelab, questions you may have, or general discussion about transition your skill from the homelab to the workplace.
- No memes or potato images.
- We love detailed homelab builds, especially network diagrams!
- Report any posts that you feel should be brought to our attention.
- Please no shitposting or blogspam.
- No Referral Linking.
- Keep piracy discussion off of this community
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
ML dataset,
i have no plan for software for now
yes it dose need to be an enterprise system it onsite
this must be self hosted
for the buget it provided by my uni
Cool! Traditionally ML datasets tend to compress and dedupe very well so depending on the budget I would probably look at an appliance with a software stack compatible that performs this extremely well then offload to object store as you scale out.
What you are looking for is a scalable appliance and I would look to building out a requirements document first covering the basic questions such as, speed, capacity, data delta(growth over time), redundancy and uptime.
Once you delve deep into these questions you’ll be asking the right questions of how and what in relation to data flow. It will then build out baseline requirements for the technology stack you require.
When I’m scoping solutions, the destination hardware is always the last question answered as if you have it as the first question, the solution is doomed from the start.
There is no “cheap” way to get petabyte level of storage. What you will spend on hdds without dedupe and compression would cover the cost of an appliance for dedupe and compression. So a mixture between the two is probably the best approach if the growth rate of the data can be pre-conditioned by a dedupe appliance before offloaded to object storage.
do you have any pointer?
by my term cheep it is comparing to solution nvidia DDN node
and what do you mean by the term "Once you delve deep into these questions"? because this assume i have some general knowledge about the storage stack that can scale to petabyte level storage. because i bet it safe to assume that truenas scale won't scale to a petabyte or more