this post was submitted on 29 Dec 2023
406 points (90.6% liked)
Technology
60087 readers
2884 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
AI will follow a similar curve as computers in general: At first they required giant rooms full of expensive hardware and a team of experts to perform the most basic of functions. Over time they got smaller and cheaper and more efficient. So much so that we all carry around the equivalent of a 2000-era supercomputer in our pockets (see note below).
2-3 years ago you really did need a whole bunch of very expensive GPUs with a lot of VRAM to train a basic diffusion (image) model (aka a LoRA). Today you can do it with a desktop GPU (Nvidia 3090 or 4090 with 24GB of VRAM... Or a 4060 Ti with 16GB and some patience). You can use pretrained diffusion models at reasonable speeds (~1-5 seconds an image, depending on size/quality settings) with any GPU with at least 6GB of VRAM (seriously, try it! It's fun and only takes like 5-10 minutes to install automatic1111 and will provide endless uncensored entertainment).
Large Language Model (LLM) training is still out of reach for desktop GPUs. ChatGPT 3.0 was trained using 10,000 Nvidia A100 chips and if you wanted to run it locally (assuming it was available for download) you'd need the equivalent of 5 A100s (and each one costs about $6700 plus you'd need an expensive server capable of hosting them all simultaneously).
Having said that you can host a smaller LLM such as Llama2 on a desktop GPU and it'll actually perform really well (as in, just a second or two between when you give it a prompt and when it gives you a response). You can also train LoRAs on a desktop GPU just like with diffusion models (e.g. train it with a data set containing your thousands of Lemmy posts so it can mimic your writing style; yes that actually works!).
Not only that but the speed/efficiency of AI tools like LLMs and diffusion models improves by leaps and bounds every few weeks. Seriously: It's hard to keep up! This is how much of a difference a week can make in the world of AI: I bought myself a 4060 Ti as an early Christmas present to myself and was generating 4 (high quality) 768x768 images in about 20 seconds. Then Latent Consistency Models (LCM) came out and suddenly they only took 8s. Then a week later "TurboXL" models became a thing and now I can generate 4 really great 768x768 images in 4 seconds!
At the same time there's been improvements in training efficiency and less VRAM is required in general thanks to those advancements. We're still in the "early days" of AI algorithms (seriously: AI stuff is extremely inefficient right now) so I wouldn't be surprised to see efficiency gains of 1,000-100,000x in the next five years for all kinds of AI tools (language models, image models, weather models, etc).
If you combine just a 100x efficiency gain with five years of merely evolutionary hardware improvements and I wouldn't be surprised to see something even better than ChatGPT 4.0 running locally on people's smartphones with custom training/learning happening in real time (to better match the user's preferences/style).
Note: The latest Google smartphone as of the date of this post is the Pixel 8 which is capable of ~2.4 TeraFLOPS. Even 2yo smartphones were nearing ~2 TeraFLOPS which is about what you'd get out of a supercomputer in the early 2000s: https://en.wikipedia.org/wiki/FLOPS (see the SVG chart in the middle of the page).
Here's the summary for the wikipedia article you mentioned in your comment:
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measure than measuring instructions per second.
^article^ ^|^ ^about^