this post was submitted on 19 Jul 2024

1202 points (99.5% liked)

Technology

74112 readers

3064 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1202

Major IT outage affecting banks, airlines, media outlets across the world (www.abc.net.au)

submitted 1 year ago* (last edited 1 year ago) by rxxrc@lemmy.ml to c/technology@lemmy.world

547 comments fedilink hide all child comments

All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It's all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We'll see if that changes over the weekend...

top 50 comments

sorted by: hot top controversial new old

[–] jedibob5@lemmy.world 217 points 1 year ago (37 children)

Reading into the updates some more... I'm starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don't see how they survive this. Just absolutely catastrophic on all fronts.

[–] NaibofTabr@infosec.pub 128 points 1 year ago (1 children)

If all the computers stuck in boot loop can't be recovered... yeah, that's a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you're responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

[–] rxxrc@lemmy.ml 78 points 1 year ago (2 children)

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven't bricked everything.

And yeah staged updates or even just... some testing? Not sure how this one slipped through.

[–] dactylotheca@suppo.fi 131 points 1 year ago (2 children)

Not sure how this one slipped through.

I'd bet my ass this was caused by terrible practices brought on by suits demanding more "efficient" releases.

"Why do we do so much testing before releases? Have we ever had any problems before? We're wasting so much time that I might not even be able to buy another yacht this year"

load more comments (2 replies)

load more comments (1 replies)

[–] RegalPotoo@lemmy.world 48 points 1 year ago (4 children)

Agreed, this will probably kill them over the next few years unless they can really magic up something.

They probably don't get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn't a contract breach.

If you are running crowdstrike, it's probably because you have some regulatory obligations and an auditor to appease - you aren't going to be able to just turn it off overnight, but I'm sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can't imagine them seeing much growth

load more comments (4 replies)

load more comments (35 replies)

[–] bdonvr@thelemmy.club 196 points 1 year ago (30 children)

The amount of servers running Windows out there is depressing to me

[–] franklin@lemmy.world 81 points 1 year ago (7 children)

The four multinational corporations I worked at were almost entirely Windows servers with the exception of vendor specific stuff running Linux. Companies REALLY want that support clause in their infrastructure agreement.

load more comments (7 replies)

load more comments (29 replies)

[–] ytg@sopuli.xyz 169 points 1 year ago (5 children)

>Make a kernel-level antivirus
>Make it proprietary
>Don't test updates... for some reason??

[–] CircuitSpells@lemmy.world 56 points 1 year ago (5 children)

I mean I know it's easy to be critical but this was my exact thought, how the hell didn't they catch this in testing?

[–] grabyourmotherskeys@lemmy.world 52 points 1 year ago (9 children)

I have had numerous managers tell me there was no time for QA in my storied career. Or documentation. Or backups. Or redundancy. And so on.

load more comments (9 replies)

[–] Voroxpete@sh.itjust.works 44 points 1 year ago (14 children)

Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can't test for.

But this... This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn't raises massive questions about the safety and security of Crowdstrike's internal processes.

load more comments (14 replies)

load more comments (3 replies)

load more comments (4 replies)

[–] sasquash@sopuli.xyz 156 points 1 year ago (17 children)

never do updates on a Friday.

load more comments (16 replies)

[–] EncryptKeeper@lemmy.world 115 points 1 year ago (12 children)

Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

[–] szczuroarturo@programming.dev 59 points 1 year ago (16 children)

I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.

load more comments (16 replies)

load more comments (11 replies)

[–] kadotux@sopuli.xyz 102 points 1 year ago* (last edited 1 year ago) (7 children)

Here's the fix: (or rather workaround, released by CrowdStrike) 1)Boot to safe mode/recovery 2)Go to C:\Windows\System32\drivers\CrowdStrike 3)Delete the file matching "C-00000291*.sys" 4)Boot the system normally

[–] StV2@lemmy.world 60 points 1 year ago (11 children)

It's disappointing that the fix is so easy to perform and yet it'll almost certainly keep a lot of infrastructure down for hours because a majority of people seem too scared to try to fix anything on their own machine (or aren't trusted to so they can't even if they know how)

[–] HaleHirsute@infosec.pub 68 points 1 year ago (4 children)

They also gotta get the fix through a trusted channel and not randomly on the internet. (No offense to the person that gave the info, it’s maybe correct but you never know)

load more comments (4 replies)

[–] NaibofTabr@infosec.pub 50 points 1 year ago (4 children)

This sort of fix might not be accessible to a lot of employees who don't have admin access on their company laptops, and if the laptop can't be accessed remotely by IT then the options are very limited. Trying to walk a lot of nontechnical users through this over the phone won't go very well.

load more comments (4 replies)

load more comments (9 replies)

[–] cheeseburger@lemmy.ca 45 points 1 year ago (2 children)

I'm on a bridge still while we wait for Bitlocker recovery keys, so we can actually boot into safemode, but the Bitkocker key server is down as well...

load more comments (2 replies)

load more comments (5 replies)

[–] richtellyard@lemmy.world 96 points 1 year ago (2 children)

This is going to be a Big Deal for a whole lot of people. I don't know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

[–] RegalPotoo@lemmy.world 43 points 1 year ago (5 children)

Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can't do credit card transactions right now

load more comments (5 replies)

load more comments (1 replies)

[–] boaratio@lemmy.world 92 points 1 year ago (16 children)

CrowdStrike: It's Friday, let's throw it over the wall to production. See you all on Monday!

[–] jayandp@sh.itjust.works 61 points 1 year ago* (last edited 1 year ago) (4 children)

^^so ^^hard ^^picking ^^which ^^meme ^^to ^^use

load more comments (4 replies)

load more comments (15 replies)

[–] NaibofTabr@infosec.pub 84 points 1 year ago* (last edited 1 year ago) (1 children)

Wow, I didn't realize CrowdStrike was widespread enough to be a single point of failure for so much infrastructure. Lot of airports and hospitals offline.

The Federal Aviation Administration (FAA) imposed the global ground stop for airlines including United, Delta, American, and Frontier.

Flights grounded in the US.

The System is Down

load more comments (1 replies)

[–] iAvicenna@lemmy.world 81 points 1 year ago

[–] invisiblegorilla@sh.itjust.works 79 points 1 year ago (4 children)

Ironic. They did what they are there to protect against. Fucking up everyone's shit

[–] Telorand@reddthat.com 78 points 1 year ago (12 children)

Maybe centralizing everything onto one company's shoulders wasn't such a great idea after all...

load more comments (12 replies)

load more comments (3 replies)

[–] AnUnusualRelic@lemmy.world 70 points 1 year ago (2 children)

An offline server is a secure server!

load more comments (2 replies)

[–] Damage@feddit.it 69 points 1 year ago (6 children)

The thought of a local computer being unable to boot because some remote server somewhere is unavailable makes me laugh and sad at the same time.

[–] rxxrc@lemmy.ml 73 points 1 year ago (4 children)

I don't think that's what's happening here. As far as I know it's an issue with a driver installed on the computers, not with anything trying to reach out to an external server. If that were the case you'd expect it to fail to boot any time you don't have an Internet connection.

Windows is bad but it's not that bad yet.

load more comments (4 replies)

load more comments (5 replies)

[–] recapitated@lemmy.world 68 points 1 year ago (2 children)

Clownstrike

load more comments (2 replies)

[–] Sylence@lemmy.dbzer0.com 66 points 1 year ago (1 children)

Yep, stuck at the airport currently. All flights grounded. All major grocery store chains and banks also impacted. Bad day to be a crowdstrike employee!

load more comments (1 replies)

[–] aaaaace@lemmy.blahaj.zone 64 points 1 year ago (5 children)

https://www.theregister.com/ has a series of articles on what's going on technically.

Latest advice...

There is a faulty channel file, so not quite an update. There is a workaround...

Boot Windows into Safe Mode or WRE.
Go to C:\Windows\System32\drivers\CrowdStrike
Locate and delete file matching "C-00000291*.sys"
Boot normally.

load more comments (5 replies)

[–] CanadaPlus@lemmy.sdf.org 63 points 1 year ago

Yep, this is the stupid timeline. Y2K happening to to the nuances of calendar systems might have sounded dumb at the time, but it doesn't now. Y2K happening because of some unknown contractor's YOLO Friday update definitely is.

[–] misk@sopuli.xyz 52 points 1 year ago (5 children)

My work PC is affected. Nice!

[–] wreckedcarzz@lemmy.world 53 points 1 year ago

Plot twist: you're head of IT

load more comments (4 replies)

[–] ari_verse@lemm.ee 52 points 1 year ago (6 children)

A few years ago when my org got the ask to deploy the CS agent in linux production servers and I also saw it getting deployed in thousands of windows and mac desktops all across, the first thought that came to mind was "massive single point of failure and security threat", as we were putting all the trust in a single relatively small company that will (has?) become the favorite target of all the bad actors across the planet. How long before it gets into trouble, either because if it's own doing or due to others?

I guess that we now know

load more comments (6 replies)

[–] BurnSquirrel@lemmy.world 51 points 1 year ago (11 children)

I'm so exhausted... This is madness. As a Linux user I've busy all day telling people with bricked PCs that Linux is better but there are just so many. It never ends. I think this is outage is going to keep me busy all weekend.

load more comments (11 replies)

[–] ililiililiililiilili@lemm.ee 51 points 1 year ago

My dad needed a CT scan this evening and the local ER's system for reading the images was down. So they sent him via ambulance to a different hospital 40 miles away. Now I'm reading tonight that CrowdStrike may be to blame.

[–] Monument@lemmy.sdf.org 46 points 1 year ago* (last edited 1 year ago) (20 children)

Honestly kind of excited for the company blogs to start spitting out their ~~disaster recovery~~ crisis management stories.

I mean - this is just a giant test of ~~disaster recovery~~ crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

If a company uses IPMI (~~Called~~ Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
But that’s obviously predicated on them having already deployed/configured the tools.

load more comments (20 replies)

[–] StaySquared@lemmy.world 46 points 1 year ago* (last edited 1 year ago) (7 children)

Been at work since 5AM... finally finished deleting the C-00000291*.sys file in CrowdStrike directory.

182 machines total. Thankfully the process in of itself takes about 2-3 minutes. For virtual machines, it's a bit of a pain, at least in this org.

lmao I feel kinda bad for those companies that have 10k+ endpoints to do this to. Eff... that. Lot's of immediate short term contract hires for that, I imagine.

load more comments (7 replies)

[–] scripthook@lemmy.world 45 points 1 year ago (14 children)

crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It's not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming...

load more comments (14 replies)

[–] ramble81@lemm.ee 45 points 1 year ago

We had a bad CrowdStrike update years ago where their network scanning portion couldn’t handle a load of DNS queries on start up. When asked how we could switch to manual updates we were told that wasn’t possible. So we had to black hole the update endpoint via our firewall, which luckily was separate from their telemetry endpoint. When we were ready to update, we’d have FW rules allowing groups to update in batches. They since changed that but a lot of companies just hand control over to them. They have both a file system and network shim so it can basically intercept **everything **

[–] r00ty@kbin.life 44 points 1 year ago

My favourite thing has been watching sky news (UK) operate without graphics, trailers, adverts or autocue. Back to basics.

[–] misterkiem@lemmy.world 43 points 1 year ago* (last edited 1 year ago) (1 children)

lol

too bad me posting this will bump the comment count though. maybe we should try to keep the vote count to 404

load more comments (1 replies)

load more comments