technology

23808 readers

233 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net

DeepSeek R1 AI can now run on a single GPU (bgr.com)

submitted 5 days ago by Dirt_Owl@hexbear.net to c/technology@hexbear.net

7 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] SamotsvetyVIA@hexbear.net 21 points 5 days ago* (last edited 5 days ago) (3 children)

They've had distills before this, a more accurate title would be "Newest DeepSeek R1 distill runs on a single GPU like all the previous ones".

Also it's not accurate to say that a Qwen3 distill is the same as the DeepSeek R1 running in the datacenter - that one is still 85x larger than the Qwen3 distill.

What stands out about DeepSeek-R1-0528-Qwen3-8B is that it only requires a GPU with 40GB to 80GB of RAM to run

This is just inaccurate. It runs in 16GB of VRAM.. because, you know, 8B parameters x 2 bytes (needed to store each parameter) = 16x10^9 bytes = 16GB..

[–] TheVelvetGentleman@hexbear.net 12 points 4 days ago* (last edited 4 days ago) (1 children)

It's also just not the same thing at all. The distillations are not even remotely close to the 600+b parameters of the parent model. You can't run Deepseek on your GPU at home. It's the equivalent of buying your kid a powerwheels car. They're not driving a car. They can't drive a car at home.

Edit - I do run the distillations at home though and they're fine for what I use them for. I'm not openai-pilled. I just hate when people say that you can run the same software as a massive datacenter privately.

[–] SamotsvetyVIA@hexbear.net 5 points 4 days ago

It's just sensationalism from a journalist who can't even be bothered to multiply two numbers.

[–] freagle@lemmygrad.ml 5 points 4 days ago (1 children)

It's gotta be a "distillate", right? Not a "distill". Verbing weirds language.

[–] SamotsvetyVIA@hexbear.net 5 points 4 days ago

Most people say distilled model, distillate sounds right as well. The process is called distillation. I've just fried my brain on the local LLM subreddit because I was trying to get the transformers library working, probably why I phrased it like that.

[–] Dirt_Owl@hexbear.net 3 points 4 days ago

I see