94

The big AI models are running out of training data (and it turns out most of the training data was produced by fools and the intentionally obtuse), so this might mark the end of rapid model advancement

you are viewing a single comment's thread
view the rest of the comments
[-] JoeByeThen@hexbear.net 42 points 3 weeks ago

No, it's not. Maybe strictly for LLMs, but they were never the endpoint. They're more like a Frontal Lobe emulator, the rest of the "brain" still needs to be built. Conceptually, Intelligence is largely about interactions between Context and Data. We have plenty of written Data. In order to create Intelligence from that Data we'll need to expand the Context for that Data into other sensory systems; Which we are beginning to see in the combo LLM/Video/Audio models. Companies like Boston Dynamics are already working with and collecting Audio/Video/Kinesthetic Data in the Spatial Context. Eventually researchers are going to realize (if they haven't already) that there's massive amounts of untapped Data being unrecorded in virtual experiences. Though I'm sure some of the delivery/ remote driver companies are already contemplating how to record their Telepresence Data to refine their models. If capitalism doesn't implode on itself before we reach that point, the future of gig work will probably be Virtual Turks where, via VR, you'll step into the body of a robot when it's faced with a difficult task, complete the task, and then that recorded experience will be used to train future models. It's sad, because under socialism there's an incredible potential for building a society where AI/Robots and humanity live in symbiosis akin to something like The Culture, but it's just gonna be another cyber dystopia panopticon.

[-] context@hexbear.net 47 points 3 weeks ago

Intelligence is largely about interactions between Context and Data

me solidarity data-outdoor-cat

intelligence
[-] QuillcrestFalconer@hexbear.net 24 points 3 weeks ago

Eventually researchers are going to realize (if they haven't already) that there's massive amounts of untapped Data being unrecorded in virtual experiences.

They already have. A lot of robots are already training using simulated environments, and nvidia is developing frameworks to help accelerate this. Also this is how things like alpha go were trained, with self-play, and these reinforcement learning algorithms will probably be extended for LLMs.

Also like you said there's a lot of still untapped data in audio / video and that's starting to be incorporated into the models.

[-] JoeByeThen@hexbear.net 16 points 3 weeks ago

Yeah, I'm familiar with a bunch of autonomous vehicles/drones being trained in simulated environments, but I'm also thinking stuff like VRChat.

[-] reddit@hexbear.net 6 points 3 weeks ago

My one quibble: that's not the future of gig work, it's the present

[-] JoeByeThen@hexbear.net 6 points 3 weeks ago

It's been a few years since I've used mturk, but there were very few VR based jobs when I last used it. Has that changed?

[-] reddit@hexbear.net 3 points 3 weeks ago

Ah sorry, I was just being a smartass, no idea how much VR is on mturk now. To be clear I think you've got an accurately bleak view of the future of this stuff

[-] JoeByeThen@hexbear.net 2 points 3 weeks ago

Ah, no worries. Yeah, pretty grim, and I've not even gotten into the horror of what they're gonna do with our biometric data. lol.

[-] HexReplyBot@hexbear.net 3 points 3 weeks ago

I found a YouTube link in your comment. Here are links to the same video on alternative frontends that protect your privacy:

this post was submitted on 11 Jun 2024
94 points (100.0% liked)

technology

22970 readers
245 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 3 years ago
MODERATORS