What happened?

On 20 October 2025, AWS suffered a global-scale outage that affected hundreds of essential services — social networks, video games, e-commerce, banks, connected devices. 

The hardest-hit region: US-EAST-1 (North Virginia, USA), a strategic zone of AWS infrastructure. 

According to AWS: "increased error rates and significant latencies for several services". 

Identified causes

1. DNS and load-balancer malfunction

The technical origin was located: a bug in the DNS-management automation system used by DynamoDB (AWS's database) — an empty DNS record in US-EAST-1 was not automatically repaired. 

In parallel, a sub-system in charge of monitoring the health of network load balancers triggered a cascade of failures. 

2. Domino effect and central dependency

Because so many services rely on AWS for hosting, storage and distribution, what looks like an "internal" issue spread across the whole ecosystem. 

The outage also highlights the fragility of an infrastructure that is now highly centralised in the hands of a few major cloud providers. 

Impact on businesses and users

Why it's a strategic problem

Best practices and lessons for businesses

  1. Plan a multi-cloud continuity strategy, or at least multi-region if you're on AWS.
  2. Watch for hidden dependencies: if a cloud micro-service collapses, what are the side effects?
  3. Set up resilience tests (chaos testing, planned failovers) to check your systems hold up in case of failure.
  4. Regularly assess your cloud providers and their redundancy strategy.
  5. Communicate quickly and clearly when an incident occurs: transparency = trust.

In summary

The AWS outage of October 2025 is a blunt reminder that even so-called "cloud" infrastructure is not infallible. It shows that high availability isn't bought by default: it has to be designed, planned and tested. For digital businesses: thinking about resilience is now a must.