this post was submitted on 09 Oct 2024
29 points (100.0% liked)

techsupport

2468 readers
13 users here now

The Lemmy community will help you with your tech problems and questions about anything here. Do not be shy, we will try to help you.

If something works or if you find a solution to your problem let us know it will be greatly apreciated.

Rules: instance rules + stay on topic

Partnered communities:

You Should Know

Reddit

Software gore

Recommendations

founded 1 year ago
MODERATORS
 

I got a really weird problem.

Two years ago bought a set of 4x DDR4 3200 16gb each, single sided and placed them in a ryzen 5600 desktop computer, which i almost never turned it off. It worked without issue.

This weekend I wanted to dust off the PC, so I took all the components out, replaced the thermal paste and so on.

Turned on the PC again, worked apparently without issues until after a while Linux was pissed about going out of memory. Out of memory? With 64gb of RAM? I checked with dmidecode -t memory and I saw that a channel was reporting completely empty.

Shut down the PC, reinserted the second channel, rebooted, saw 64gb. One hour later, kernel panic. Rebooted in memtest86+, error in memory. What? Removed one module, error. Removed two modules, no error. Switched the modules, no error. What??

Placed the two modules that are passing the test in another computer, error. Put back in the original computer, pass test. AAAAAAAAAAAAAAA

Now I downclocked from 3200 to 2400 and everything seems working fine.

What could be? Have I been cursed?

After a few reinsertions do the slots degrade to a point that can't sustain 3200 anymore?

top 19 comments
sorted by: hot top controversial new old
[–] breadsmasher@lemmy.world 21 points 1 month ago* (last edited 1 month ago)

Maybe the contacts were damaged on reinsert? Not just degrading / wearing down, but physically damaged

[–] Nougat@fedia.io 14 points 1 month ago (2 children)

... dust off the PC ...

It's not at all out of the question that some filth got into your connector(s). Hit them with a mess of canned air and try again?

[–] Moonrise2473@feddit.it 1 points 1 month ago

it might be, after all i took out all the components and then dusted the case with compressed air (didn't let the fans spin)

[–] infeeeee@lemm.ee 1 points 1 month ago

That's the most common thing, happened to me multiple times. Even a very small amount of dust in the slot can cause issues like that.

[–] hsdkfr734r@feddit.nl 9 points 1 month ago* (last edited 1 month ago) (3 children)

I don't think that you will see a difference in performance. :)

SO DIMM and DIMM sockets have a somewhat limited durability (mating cycles) of just 25. link

I never reached that limit. And I'm not sure if this is related to your case.

[–] Moonrise2473@feddit.it 4 points 1 month ago

Wow I didn't imagine that the connector was so fragile

[–] BearOfaTime@lemm.ee 3 points 1 month ago

Wow, I had no idea. Thanks for the link

[–] Asifall@lemmy.world 2 points 1 month ago (1 children)

I wonder what that 25 number actually means. It’s 25 across multiple slot types so I’m guessing it’s less a measured value and more a quality control number based on their most fragile product.

Probably something like a sample is cycled 25 times and if less than X% still test as being in spec they know something is wrong with the current batch, but again that’s mostly a guess and the actual durability experienced by the end user would vary significantly depending on what the acceptable failure rate is.

[–] hsdkfr734r@feddit.nl 2 points 1 month ago

I think so too. Most likely most of the sockets will survive more than 25 cycles. Maybe it's a specified minimum durability which is guaranteed for nearly all sockets.

[–] _haha_oh_wow_@sh.itjust.works 5 points 1 month ago

Inspect the channels for debris. Hit the RAM contacts and slot with contact cleaner (don't get any on your skin).

[–] aubeynarf@lemmynsfw.com 3 points 1 month ago* (last edited 1 month ago) (1 children)

RAM is easily damaged by static discharge. Were you wearing a ground strap and took care not to let the memory module touch any ungrounded surfaces while you were handling it?

Static damage can often appear as marginal or intermittent failures, probably more often than complete failure.

[–] Moonrise2473@feddit.it 3 points 1 month ago (1 children)

No I manhandled them and put them on a random shelf, I was under the impression modern electronics are designed to withstand that light abuse, saw a electroboom video where he tries and fails to fry RAM with electrostatic discharge

[–] dylanmorgan@slrpnk.net 5 points 1 month ago

Newer components are if anything more vulnerable to ESD because they have more delicate construction.

[–] Asifall@lemmy.world 2 points 1 month ago* (last edited 1 month ago) (1 children)

Placed the two modules that are passing the test in another computer, error

So you put the ram you thought was good in another motherboard and it failed memtest? I’d interpret that to mean one of 3 things

A) the problem is in one of those modules you switched

B) separate problems occurred on both motherboards either due to unrelated issues or the memory being seated incorrectly (this is really unlucky)

C) there’s a problem with the modules you switched and an unrelated problem either in the other modules or in your primary motherboard (you poor bastard)

Did you take note of where in memory memtest was finding errors? If it wasn’t in the same general area between runs then its more likely to be a motherboard issue.

[–] Moonrise2473@feddit.it 1 points 1 month ago

On the x370 Ryzen motherboard the test always failed at test #5 and it appeared to be shifted bytes (expected FEFEFEFE got 00FEFEFEFE)

On a H series lowest end Intel motherboard it just beeps and won't even boot in dual channel. Single channel instead boots and pass the test. The Intel motherboard has those shitty RAM slots where there's only one clip on a single side and the other is fixed (to save 1¢ I guess) so it's a bit difficult to assure proper contact

[–] Motorheadbanger@lemmy.world 2 points 1 month ago

I've encountered oxidisation of the contacts before. You can try and rub them with an ordinary eraser

[–] brygphilomena@lemmy.world 1 points 1 month ago (1 children)

You put new thermal paste on things? Did you remove the CPU as well? You could have damaged some pins there too.

The delay in the failure sounds like it could be as the components expand with heat.

Take it apart and look at all the pins of both the RAM, RAM slot, and CPU (if you removed that) for any damage.

[–] Moonrise2473@feddit.it 1 points 1 month ago* (last edited 1 month ago)

i put the new thermal stuff only on the cpu, specifically that new honeywell material. It's a bit smaller than the cpu, ordered 3x3 cm measuring a core i3 that i had on hand, while the ryzen has a bigger IHS and fits better with a 4x4 cm

i'm thinking maybe i tightened the cooler too much but it's the OEM one, so it shouldn't allow overtightening because has the stoppers on the threads... unless the honeywell pad is too thick for that

[–] umbrella@lemmy.ml 1 points 1 month ago

the gold plating on the contacts do degrade