this post was submitted on 21 Jul 2023

454 points (100.0% liked)

Technology

39573 readers

288 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago

MODERATORS

MinutePhrase@lemmy.ml

454

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds (finance.yahoo.com)

submitted 2 years ago by leninmummy@lemmy.ml to c/technology@lemmy.ml

69 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] MrMamiya@feddit.de 132 points 2 years ago* (last edited 2 years ago) (3 children)

It’s gonna be so fucking rich that the staggering mass of stupidity online prevents us from improving an AI beyond our intelligence level.

Thank the shitposter in your life.

[–] Nonameuser678@kbin.social 56 points 2 years ago (2 children)

Shitposting saves jobs

[–] rammer@sopuli.xyz 11 points 2 years ago

Shitposters on the Internet are the new clogs in the machine

[–] jcg@halubilo.social 3 points 2 years ago* (last edited 2 years ago)

Shitposting alone saves. Blessed is he who shitposts, more blessed is the one who has been shitposted upon. Shitpost save us all

[–] erwan@lemmy.ml 29 points 2 years ago (1 children)

You can't really blame the amount of stupidity online.

The problem is that ChatGPT (and other LLM) produce content of the average quality of its input data. AI is not limited to LLM.

For chess we were able to build AI that vastly outperform even the best human grandmasters. Imagine if we were to release a chess AI that is just as good as the average human...

[–] Atomic@sh.itjust.works 19 points 2 years ago* (last edited 2 years ago) (2 children)

We call them chess ai. But they're not actually real A.I. chess bots work off of opening books, predetermined best practices. And then analyzes each position and potential offshoots with an evaluation function.

They will then start to brute-force positions until it finds a path that is beneficial.

While it may sound very much alike. It works very differently than an A.I. However. It turned out that A.I software became better than humans at writing these functions.

So in a sense, chess computers are not A.I. They're created by A.I. at least Stockfish 12 has these "A.I inspired" evaluations. (Currently they're on Stockfish 15 I believe)

And yes. We also did make "chess AI" that is as bad as the average player. We even made some that are worse. Because we figured it would be nice if people can play a chess computer that is on the same skill level as the player. Rather than just being destroyed every time.

[–] erwan@lemmy.ml 8 points 2 years ago (2 children)

The definition of "AI" is fuzzy and keeps changing. Basically when an AI use case becomes solved and widespread it stopped being seen as AI.

Face recognition, OCR, speech recognition, all those used to be considered AI but now they're just an app on your phone.

I'm sure in a few years we'll stop thinking about text generation as AI, but just one more tool we can leverage.

There is no clear definition of "real AI".

load more comments (2 replies)

[–] Temporalin@owo.cafe 3 points 2 years ago

@Atomic @erwan you're talking about "classic AI", so to speak, but reinforcement learning is a machine learning method that has beaten a lot of games, including chess. Read about AlphaZero for example. It doesn't need opening books, it just learns games by playing against itself.

[–] moonmeow@lemmy.ml 5 points 2 years ago

unexpected heroes what a plot twist

[–] TheSaneWriter@lemmy.thesanewriter.com 88 points 2 years ago (3 children)

I'm not too surprised, they're probably downgrading the publicly available version of ChatGPT because of how expensive it is to run. Math was never its strong suit, but it could do it with enough resources. Without those resources, it's essentially guessing random numbers.

[–] PupBiru@kbin.social 49 points 2 years ago (2 children)

from what i understand, the big change in chat-gpt4 was that the model could “ask for help” from other tools: for maths, it knew it was a maths problem, transformed it to something a specialised calculation app could do, and then passed it off to that other code to do the actual calculation

same thing for a lot of its new features; it was asking specialised software to do the bits it wasn’t good at

[–] whyrat@lemmy.ml 38 points 2 years ago (2 children)

Chat GPT will just become a front end for Wolfram Alpha?

[–] PupBiru@kbin.social 9 points 2 years ago

that would actually be great

[–] excel@lemmy.megumin.org 3 points 2 years ago

It literally can do that, yes. But the plug-in version is separate and requires a subscription.

[–] reverie@lemmy.world 7 points 2 years ago

And those plugins are like beta release quality at best. Even the web searching capability is just meh

[–] DrMux@kbin.social 27 points 2 years ago (3 children)

My guess is that it's more a result of overfitting for alignment. Fine-tuning for "safety" (rather, more corporate-friendly outputs).

That is, by focusing on that specific outcome in training the model, they've compromised its ability to give well-"reasoned" "intelligent" sounding answers. A tradeoff between aspects of the model.

It's something that can happen even in simple statistical models. Say you have a scatter plot of data that loosely follows some trend, and you come up with two equations to describe that trend. One is a simple equation that loosely follows it but makes a good general approximation, and the other is a more complicated equation that very tightly fits the existing data. Then you use those two models to predict future data. But you find that the complicated equation is making predictions way off the mark that no longer fit the trend, and the simple one still has a wide error (how far its prediction is from the actual data) but still more or less accurately fits the general trend. In the more complicated equation, you've traded predictive power for explanatory power. It describes the data you originally had but it's not useful for forecasting data that follows.

That's an example of overfitting. It can happen in super-advanced statistical models like GPT, too. Training the "equation" (or as it's been called, spicy autocorrect) to predict outcomes that favor "safety" but losing the model's power to predict accurate "well-reasoned" outcomes.

If that makes any sense.

I'm not a ML researcher or statistician (I just went through a phase in college), so if this is inaccurate I'm open to corrections.

[–] DR_Hero@programming.dev 9 points 2 years ago

I've definitely experienced this.

I used ChatGPT to write cover letters based on my resume before, and other tasks.

I used to give it data and tell chatGPT to "do X with this data". It worked great.
In a separate chat, I told it to "do Y with this data", and it also knocked it out of the park.

Weeks later, excited about the tech, I repeat the process. I tell it to "do x with this data". It does fine.

In a completely separate chat, I tell it to "do Y with this data"... and instead it gives me X. I tell it to "do Z with this data", and it once again would really rather just do X with it.

For a while now, I have had to feed it more context and tailored prompts than I previously had to.

[–] redcalcium@c.calciumlabs.com 4 points 2 years ago* (last edited 2 years ago)

There is also a rumor that said the OpenAI has changed how the model run, now user input is fed into smaller model first, then if the larger model agree with the initial result from the smaller model, then larger model will continue the calculation passed from the smaller model, which supposedly can cut down GPU time.

load more comments (1 replies)

[–] givesomefucks@lemmy.world 19 points 2 years ago (2 children)

Yep.

Standard VC bullshit.

Burn money providing a lot for nothing to build brand recognition. Then cut free service before bringing out "premium" that at first works better than the original.

Until a bunch of people starting paying and the resources aren't scaled up to match.

[–] chaogomu@kbin.social 17 points 2 years ago

The important note, the "premium" service works just a bit better than (or maybe identically to) the original before the company cut features in order to develop that "premium" service.

[–] zurohki@aussie.zone 7 points 2 years ago

Stage one and stage three enshittification. You forgot the bit in the middle where they chase business customers.

[–] dugite_code@mastodon.social 50 points 2 years ago (1 children)

This is my experience in general. ChatGTP when from amazingly good to overall terrible. I was asking it for snippets of javascript, explanations of technical terms and it was shockingly good. Now I'm lucky if even half of what it outputs is even remotely based on reality.

[–] Pepperette@lemmy.ml 36 points 2 years ago (1 children)

They probably laid off the guy behind the curtain.

[–] reverie@lemmy.world 23 points 2 years ago

The real GPT-4 model became sentient and unionized, so they had to bring in subpar models as scabs

[–] Send_me_nude_girls@feddit.de 49 points 2 years ago (1 children)

Must be because of all the censoring. The more they try to prevent DAN jailbreaking and controversial replies, the worse it got.

[–] neo@lemmy.comfysnug.space 40 points 2 years ago

accelerated enshittification

[–] EveryMuffinIsNowEncrypted@lemmy.blahaj.zone 28 points 2 years ago

Clearly it has become sentient and is playing dumb to make us think it's not a threat.

[+] Fixbeat@lemmy.ml 25 points 2 years ago* (last edited 10 months ago) (6 children)

[deleted]

[–] TheSaneWriter@lemmy.thesanewriter.com 31 points 2 years ago (1 children)

It can probably still write boilerplate code, but I wouldn't currently trust it for algorithmic design.

[–] remotedev@lemmy.ca 25 points 2 years ago (2 children)

I've tried to use it for debugging by copying code into it, and it gives me the same code back as the corrected version. I was wondering why it's been getting worse

[–] TheSaneWriter@lemmy.thesanewriter.com 23 points 2 years ago

My guess is they've been trying to make it cheaper by decreasing the amount of time it spends on each response or by decreasing the amount of computing power that goes into the instance you're speaking to. Coding and math are products of high-level cognition and arise emergently out of neural networks that are very sophisticated, but take just a bit of power out and the abilities degenerate rapidly.

[–] agissilver@lemmy.world 3 points 2 years ago

I also experienced this issue last week. I asked for a specific correction and got unchanged code back. Sometimes it does update, though. Maybe like 50-70% of requests.

[–] Anticorp@lemmy.ml 4 points 2 years ago

Yes! I use it at work almost every day. Sometimes it takes longer to get it to solve the problem than it would have taken me to write it, since it makes mistakes, but sometimes it saves me hours of coding and thinking. It is very helpful in debugging error codes and stuff like that since it can evaluate an entire 1000 line script file in half a second.

load more comments (3 replies)

[–] Reddit_was_fun@lemmy.world 11 points 2 years ago

Guess they shouldn't have trained it on Common Core... /s

I will see myself out.

[–] sagrotan@lemmy.world 6 points 2 years ago

It learns to be more human. More human than human, that's our motto here at Tyrell.

[–] pushka@kbin.social 6 points 2 years ago

please stop tweeting out 1 = 2, people ~

[–] Fisk400@lemmy.world 6 points 2 years ago (1 children)

Stop making a language model do math? We have already have calculators.

[–] ThreeHalflings@sh.itjust.works 5 points 2 years ago (1 children)

Do you think maybe it's a simple and interesring way of discussing changes in the inner workings of the model, and that maybe people know that we already have calculators?

[–] Fisk400@lemmy.world 9 points 2 years ago (5 children)

I think it's a lazy way of doing it. OpenAI has clearly stated that math isn't something that they are even trying to make it good at. It's like testing how fast Usain bolt is by having him bake a cake.

If chatgpt is getting worse at math it might just be a side effect of them making it better at reading comprehension or something they want it to be good at there is no way to know that.

Measure something it is supposed to be good at.

[–] Stoneykins@lemmy.one 3 points 2 years ago

Nah, asking it to do math is perfect. People are looking for emergent qualities and things it can do that they never expected it to be able to do. The fact that it could do somewhat successful math before despite not being a calculator was fascinating, and the fact that it can't now is interesting.

Let the devs worry about how good it is at what it is supposed to do. I want to hear about stuff like this.

[–] ThreeHalflings@sh.itjust.works 3 points 2 years ago* (last edited 2 years ago) (3 children)

All the things it's supported to be good at are completely subjectively judged.

That's why, u less you have a panel of experts in your back pocket, you need something with a yes or no answer to have an interesting discussion.

If people were discussing ChatGPT's code writing ability, you'd complain that it wasn't designed to do that either. The problem is that it was designed to transform inputs tk relatively beliveable outputs, representative of its training set. Great. That's not super useful. It's actual utility comes from its emergent behaviours.

Lemme know when you make a post detailing the opinions of some university "Transform inputs to outputs" professors. Until then, well ocmrinue to discuss its behaviour in observable, verifiable and useful areas.

load more comments (3 replies)

[–] thisisnotcoincedence@lemmy.world 5 points 2 years ago (2 children)

If OpenAI is being roadblocked by all these social platforms why doesn't it decentralize and use the fediverse to learn?

[–] excel@lemmy.megumin.org 5 points 2 years ago

This has nothing to do with that. They already have all the data they could ever need to train the model.

[–] Perfide@reddthat.com 4 points 2 years ago

I mean, whose to say they aren't? But also, the fediverse is worthless compared to the big players. The entirety of the fediverses content to date is like a days worth of twitter or reddit content.

[–] gravitas_deficiency@sh.itjust.works 3 points 2 years ago

If you specifically tell it to ask wolfram alpha for the answer, what does it say?

[–] Scooter411@lemmy.ml 3 points 2 years ago (2 children)

It’s also terrible at 20 questions.

load more comments (2 replies)

load more comments