Technology

70080 readers

3171 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

115

My jaw hit the floor when I watched an AI master one of the world's toughest physical games in just six hours (www.techradar.com)

submitted 1 year ago by Lifecoach5000@lemmy.world to c/technology@lemmy.world

60 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] just_another_person@lemmy.world 18 points 1 year ago* (last edited 1 year ago) (1 children)

They don't discuss it here, but it's most likely a reinforcement model that operates different generations of learned behavior to decide if it's improving or not.

It would know that the ball going in the hole is "bad", and then try to avoid that happening. Each move that is "good' is then kept in a list of moves it should perform in the next generation of its plan to avoid the "bad" things. Loop -> fail -> logic build -> retry. After 6 hours, it has mapped a complete list of "good" moves to affect it's final outcome.

The answer your question: no, it would not be able to use what it learned here on a different map of the board. It's building reactions to events based on this one board, and bound by rules. You could use the ruleset with another board, but it would need to learn it all again just as a human would.

The thing about these models is less that they will work (it is assumed they eventually will through trial and error), but how efficiently they will work. The number of generational cycles and retries is usually the benchmark when dealing with reinforcement, but they don't discuss that data here either.

[–] INeedMana@lemmy.world 1 points 1 year ago

Yes, but that's kind of my point

We see it learn something with insane precision but most often it is almost an effect of over-training. It probably would require less time to learn another layout but it's not learning the general rules (can't go through walls, holes are bad, we want to get to X), it learns the specific layout. Each time a layout changes, it would have to re-learn it

It is impressive and enables automation in a lot of areas, but in the end it is still only machine learning, adapting weights to specific scenario