this post was submitted on 03 Dec 2023
18 points (84.6% liked)

Free Open-Source Artificial Intelligence

2834 readers
28 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] rufus@discuss.tchncs.de 1 points 10 months ago* (last edited 10 months ago)

Or just not bother with the GPU at all, get a much cheaper computer/cloud instance without. And do it on CPU if you're going to pipe it through RAM anyways. Tests with llama.cpp (at least) have shown that it's bound by RAM (bus width and speed). Even my old 4-core Xeon can do the matrix multiplications faster than it can get the numbers in. So the extra step sending it to the GPU and doing the computations there seems to be superfluous, unless I'm missing something. Sure, I use quantized values and my computer is old and has DDR4 memory. (And less memory lanes than a proper, modern server.) So the story could be a little bit different in other circumstances. But I'd be surprised if this changed fundamentally.

I'm not sure if renting vs buying makes a difference, though. That depends on how much you use your GPU. And how. Sure, if it's just idle for the most time, or sits under your table, switched off at night, you'd be better off renting a cloud instance. But that's just you using it wrong. If you buy a car and then just use twice a year, it's the same. But not if you drive to work every single weekday.

@tinwhiskers: You're right. I kinda forgot that we also do stuff that's not fed to the user immediately. I can imagine slower inference being useful to index or sum up stuff during the night. Or have it work in conjunction with a smaller model. Maybe fact-check stuff with it's increased intelligence level.