What happened?
On 20 October 2025, AWS suffered a global-scale outage that affected hundreds of essential services — social networks, video games, e-commerce, banks, connected devices.
The hardest-hit region: US-EAST-1 (North Virginia, USA), a strategic zone of AWS infrastructure.
According to AWS: "increased error rates and significant latencies for several services".
Identified causes
1. DNS and load-balancer malfunction
The technical origin was located: a bug in the DNS-management automation system used by DynamoDB (AWS's database) — an empty DNS record in US-EAST-1 was not automatically repaired.
In parallel, a sub-system in charge of monitoring the health of network load balancers triggered a cascade of failures.
2. Domino effect and central dependency
Because so many services rely on AWS for hosting, storage and distribution, what looks like an "internal" issue spread across the whole ecosystem.
The outage also highlights the fragility of an infrastructure that is now highly centralised in the hands of a few major cloud providers.
Impact on businesses and users
- Millions of users reported access disruptions or very slow response times on platforms like Snapchat, Reddit, Fortnite, banks and connected devices.
- For businesses that are AWS customers, this outage meant: lost revenue, interrupted critical services, lost trust.
- A wake-up call for the ecosystem: even cloud giants are not immune to a major failure.
Why it's a strategic problem
- Cloud infrastructure concentration: AWS dominates with around 30% of the cloud market; an outage of this scale shows that a single point of failure can have global effects.
- Cascade effect: a bug in an internal sub-system (DNS, load balancing) can degrade the entire service.
- Need for resilience: many companies had no external or multi-cloud fallback ready to take over.
- Reputation & trust: for AWS as for its customers, reliability is a key part of the business model.
Best practices and lessons for businesses
- Plan a multi-cloud continuity strategy, or at least multi-region if you're on AWS.
- Watch for hidden dependencies: if a cloud micro-service collapses, what are the side effects?
- Set up resilience tests (chaos testing, planned failovers) to check your systems hold up in case of failure.
- Regularly assess your cloud providers and their redundancy strategy.
- Communicate quickly and clearly when an incident occurs: transparency = trust.
In summary
The AWS outage of October 2025 is a blunt reminder that even so-called "cloud" infrastructure is not infallible. It shows that high availability isn't bought by default: it has to be designed, planned and tested. For digital businesses: thinking about resilience is now a must.