If you can read this, it means you aren’t going through Cloudflare to get here. Once again, Cloudflare is having problems and chunks of the internet are not online as a result. X is up sorta (intermittent and there are media problems) and a variety of other sites, such as SciTechDaily, are still down.
I’m old enough to remember that one of the ideas behind what became the internet was decentralization and distribution so that the network could and would stay up in the face of disasters up to and including nuclear war. That the system would route around gaps and stay up for as many people as possible.
However, the ‘best and brightest’ alleged-elites thought the internet would be better centralized and controlled (and much easier to turn off if the peasants proved too revolting). Thus we have centralized operations and single-point failures even without apparent ill-will. I use apparent advisedly as I have wondered if certain things were not various actors, state and non-state, testing and trying on occasion. After all, theoretical knowledge is useful but there comes a point where you have to test before you use it.
Right now, I lean towards incompetence (and I regard any engineering that allows single-point failure modes as incompetent) but how many times have we had major internet outages in 2025? Heck, this is either the second or third just for Cloudflare. It does make one wonder, and one also wonders if anyone in U.S. leadership at any level has started to tumble to the fact that centralizing internet operations might have been a mistake? If Elon is smart, he has an alternate internet based on Starlink ready to go at need (and after he is safely off Earth).
Getting hit by lightning is not fun! If you would like to help me in my recovery efforts, and to start a truly new life, feel free to hit the fundraiser at A New Life on GiveSendGo, use the options in the Tip Jar in the upper right, or drop me a line to discuss other methods. If you want to know some of what it is going for, read here. There is also the Amazon Wish List in the Bard’s Jar. It is thanks to your gifts and prayers that I am still going. Thank you.
While the likelihood is that some doofus forgot to pay the light bill at Cloudflare, it doesn’t go unnoticed by the tyrants among us, and they can’t possibly miss noticing that they can just TURN OFF Twidder if it starts getting to uncomfortable for them. You can bet that the US Gubmant (even the ones we think are “on our side”) and the Eff Bee Eye / See Eye Ayy / Enn Ess Ayy sorts have filed it away for future reference.
I sincerely hope that the downed sites like Twiddlex realizes their vulnerabilities and is already working towards self-reliance.
Apologies for my atrocious grammar/usage missteps. I got slammed with work right as I was trying to finish my thoughts. SO inconvenient.
LOL! It is always such a pain when work demands attention from the important things. 🙂
Agree, and hope they are! I’m reminded of back in the day, when many blogs had at least one (if not a couple) of backup sites just in case…
There are a bunch of related issues here.
The first is that, for moderately good reasons, organizations have outsourced all kinds of things to cloud services like AWS and Cloudflare, that a decade ago they hosted in their own data centers. In fact at $dayjob we are in the process of doing the same ourselves because we looked at the costs and the benefits and for our workloads it made sense (for others it may not, there are a fair number of people who have moved from AWS back to private hosting).
Second businesses have also, for equally good business reasons, outsourced all kinds of other functions to services that themselves use the cloud. And in many cases that too is indirect. i.e. business B outsources ecommerce to service S which in turn outsources identity management to service O which runs on, say, Azure. Meanwhile service S fronts its systems with Cloudflare and business B has its customer relationship systems running on AWS.
In theory – and indeed one of the justifications for the use of cloud providers – the cloud service is more reliable than having your own data center. In fact this is mostly true. If your own hosted VMware host or NAS fails then you probably have to scramble to make sure things keep working. The cloud providers have generally built things so that if a rack, or indeed an entire data center, suffers an outage things keep going.
The cloudflare problem today and the recent AWS one (and most of the others in the past) have all been due to errors in the “Orchestration” systems that are how the cloud providers configure servers and databases and dispatch them on demand. The data centers themselves were fine, the problem was in the c2 layer that tells them what to do
Unfortunately the cloud is, famously, just someone else’s computer. If you do it right you can be resilient in the event of a cloud failure. But, as in the example above, business B is vulnerable to outages in AWS, Azure AND Cloudflare so it’s actually less resilient. Worse, it may not be obvious that services S and O use the cloud providers they do so business B has no idea (until Azure goes titsup.com) that it’s ecommerce site has a key dependency on Azure.
And this repeats
Good points and thank you for sharing them! Much appreciated.
Space based internet as an operative idea goes back at least as far as Hughes in 1993. And several people, companies and governments are actively looking at orbital internet. If Wang and Musk are looking at using Starship to boost 4GW data centers into orbit things will hopefully get more robust. Did you see that box Wang gave Musk? A pentaflop computer you can hold in your hand!?
No, I missed that and will go take a look. We do live in a golden age (still) when a pentaflop computer can fit in the hand. Still remember my first computer, a Kaypro 2X with 2 (count them, two!) floppy disc drives and memory that may have been measured in K not Mb. While Gates may not have actually made the 640k comment, it is amazing how fast memory needs and memory have taken off. As for orbital, think it a good way to go as we need belt and suspenders IMO. Will also say that if Wang and Musk go at it, my money will be on them.
“I got slammed with work right as I was trying to finish my thoughts” “Work” is a four-letter word (I’m retired now so I can say that….)
Way back when, I was designing enterprise systems for things as mundane as receipt printing and as critical as handling 911 calls – most, thankfully, were somewhere in between – but the question always/i> was failure mode – what happens when this system fails, how could/will it fail, where, and what, redundancy do I need, what level of failure can be tolerated, and when it does fail (because that will always happen) what’s required for recovery?
One client was losing sleep over the cost of a new financial system, the old system was clearly inadequate for the volume of transactions, and the new types of transactions, it was handling (or, attempting to…). When I laid out the costs they had not considered – it isn’t just hardware and software, it’s everything – who supports (aka “babysits”) this new system, what are those personnel costs (hiring, supervision, constant training on new stuff, alternate support for vacations and sick time, etc., etc., etc.). I pointed out that there was an alternative – ASP (Application Service Provider) services. “They” own – and maintain – the computers, the people to run them, and so on (sort of a “junior” cloud thing). The difference was $3.2M initial outlay with $200+K ongoing estimated peripheral costs vs $80K/year capped costs (5 yr contract) for ASP. They jumped at ASP, of course. I pointed out there’s about $5-8K/yr minimum beyond that to maintain adequate redundcancy and recovery in the event of failures,, like…the internet being down, contractually-stipulated recovery time to engage their backups in the event of a main site failure (they were fully redundant, but switchover required 2 hours (from initial discovery) to fully complete with all functions operating), can they print payroll checks and overnight FedEx them, where do client payments go during the failover recovery, etc., is all this transparent to clients, suppliers, etc. or do they have to make changes in their systems to get payment in or receive payments, etc. (The ASP provider had the ability to add, for a fee, a second redundant backup across the internet to the client’s system so the client would at least have a copy of all their data in-house, albeit 12-24 hours behind actual, “just in case” (the client turned this down because it meant about $60K in hardware, another $12K in software, and pushed them right to the edge of hiring another support tech. Which was fine, as long as I’ve fulfilled all contract requirements and your check has cleared, we can be Best Friends).
Enter stuff like AWS and Cloudflare and The Magic Cloud. AWS is just “ASP for the technologically ignorant,” Cloudflare is a brilliant way of putting oneself in the flow path of lots of money because “everyone wants a cheaper way to do this” and they offer the service to do exactly that (I’ve noticed that some retail web sites get crippled when Google has problems with Captcha because they’re depending on “someone else” to manage authentication).
Your points about “the internet was designed to route around problems and now we’ve deliberately incorporated those very crippling problems into normal daily flow” are spot on. Thing is, we’ve shot ourselves in the ass, reloaded, and fired again; the second time with bigger bullets. There are not enough competent IT people in the US to shift those *necessary* services into everyone’s own IT shop, I’m not sure there’s enough available hardware / square footage / cooling capacity to do it either (there certainly isn’t the will to pay for it in the C-suites) and what’s coming out of the “universities and colleges” (aka “the semi-adult child care industry”) ain’t up to the task (there’s sort of an “invisible apprenticeship program” in IT where, depending on the individual, it takes years to go from “textbook to real accomplishment skills,” not to mention the huge differences in operating environments that requires a degree of localized specialization not at all understood by people outside IT).
As Ms. Hoyt often proclaims, “keep your clothes and weapons where you can find them in the dark” because the current IT biz is considerably more fragile than many people think it is. Commenters to this post expressed concern about “government using single-point-of-failure stuff to shut down society,” but what about a couple dozen really smart terrorists who understand just where, and how, to do the same thing, and do it permanently? 19 high school dropouts with box cutters just about brought NYC’s entire finance industry to its knees 24 years ago (and without the Todd Beamers of the world, D.C. would look – and be acting – a lot different), what could a few dozen high IQ IT-smart types do? How many data centers without electricity – or secondary app providers like Cloudflare – would it take to grind things to a halt? And, for how long? “Government” would be able to turn it back on after the citizens paid the ransom, but when certain data centers are smoking craters it might take a little longer, and I wouldn’t rule out that if the failure was large enough and long enough there might not be a society left to use the technology even if it was restored. (Random thought: how many of those who hate America and sincerely wish its demise are being educated by American institutions and training in American businesses?)
Iggy
Thanks Iggy! Good points, and you made me both laugh and think. Good job!
“Cloudflare’s recent outage, 53 days after a Rust-based system launch, is analyzed. The Lunduke Journal examines the root cause, a memory pre-allocation error, and its implications for the tech industry. Specific code snippets illustrate the problem and spark debate among Rust developers.” Long way to go, but it’s the wokester/DEI folks again. youtube {DOT} com / watch {question mark} v=TpXBenAvhi8 … HT to your spam filter!
The spam filter says “thanks!” It is a good one, and while it sometimes eats things that are benign, it also takes care of a LOT of non-benign.