Thread regarding Amazon.com layoffs

Huge Global Outage

“We can confirm significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region. This issue also affects other AWS Services in the US-EAST-1 Region as well."

No failover? Come on.

The following AWS services have been affected by this issue:

AWS Global Accelerator
AWS VPCE PrivateLink
AWS Security Token Service
AWS Step Functions
AWS Systems Manager
Amazon CloudFront
Amazon DynamoDB
Amazon Elastic Compute Cloud
Amazon EventBridge
Amazon EventBridge Scheduler
Amazon GameLift Servers
Amazon Kinesis Data Streams
Amazon SageMaker
Amazon VPC Lattice


by
| 912 views | | 5 replies (last October 21) | Reply
Post ID: @OP+1k80fm7cg

5 replies (most recent on top)

Estimates on the financial impact of yesterday's AWS outage range from 'hundreds of billions' to over $75 million per hour ($900M+). One thing that we can all agree on though is that multi-cloud is the future - especially after Jassy states "we will need fewer people doing some of the jobs that are being done today."

See for yourself:

2011 April 21 Outage At 12:47 am PDT on April 21, an invalid traffic shift prior to network upgrade caused EBS instances to lose connectivity to one another with an availability zone of US-East-1 region. Once the errors were localized to just one availability zone, the EBS recovery These connectivity errors impacted EBS volume and EC2 instances in multiple availability zones and caused issues for customers until full recovery at 3:00 pm PDT on April 24
2011 August 7 Outage Power lost in Ireland, EU West region, causing disruption and outage "service disruption began at 10:41 AM PDT on August 7th" (also mentioned but distinct from the outage mentioned above; it happened around the same time as the US outage). Due to followup issues, full restoration of e g EBS and RDS took in the order of days.
2011 August 8 Outage EC2 went down around 10:25 p.m. Eastern in Amazon's U.S. East Region. The cloud outage lasted roughly 30 minutes, but took down the Web sites and services of many major Amazon cloud customers, including Netflix, Reddit and Foursquare. The issue happened in the networks that connect the Availability Zones to the Internet and was primarily caused by a software bug in the router.
2012 June 29 Service disruption A major disruption occurs to the EC2, EBS, and RDS services in a single availability zone (due to a large scale electrical storm which swept through the Northern Virginia area).
2012 October 22 Outage A major outage occurs (due to latent memory leak bug in an operational data collection agent), affecting many sites such as Reddit, Foursquare, Pinterest, and others.
2012 December 24 Outage AWS suffers an outage, causing websites such as Netflix instant video to be unavailable for customers in the Northeastern United States.
2013 September 13 Outage AWS US-East-1 region experienced network connectivity issues affecting instances in a single Availability Zone. We also experienced increased error rates and latencies for the EBS APIs and increased error rates for EBS-backed instance launches.
2014 November 26 Service disruption Amazon CloudFront DNS server went down for two hours, starting at 7:15 p.m. EST. The DNS server was back up just after 9 p.m. Some websites and cloud services were knocked offline as the content delivery network failed to fulfill DNS requests during the outage. Nothing major, but worthy of this list because it involved the world's biggest and longest-running cloud.
2015 September 20 Outage The Amazon DynamoDB service experiences an outage in an availability zone in the us-east-1 (North Virginia) region, due to a power outage and inadequate failover procedures. The outage, which occurs on a Sunday morning, lasts for about five hours (with some residual impact till Monday) and affects a number of related Amazon services include Simple Queue Service, EC2 autoscaling, Amazon CloudWatch, and the online AWS console. A number of customers are negatively affected, including Netflix, but Netflix is able to recover quickly because of its strong disaster recovery procedures.
2016 June 5 Outage AWS Sydney experiences an outage for several hours as a result of severe thunderstorms in the region causing a power outage to the data centers.
2017 February 28 Outage Amazon experiences an outage of S3 in us-east-1. There are also related outages for other services in us-east-1 including CloudFormation, autoscaling, Elastic MapReduce, Simple Email Service, and Simple Workflow Service. A number of websites and services using S3, such as Medium, Slack, Imgur and Trello, are affected. AWS's own status dashboard initially fails to reflect the change properly due to a dependency on S3. On March 2, AWS reveals that the outage was caused by an incorrect parameter passed in by an authorized employee while running an established playbook, that ended up deleting more instances than the employee intended.
2018 March 2 Service degradation Starting 6:25 AM PST, Direct Connect experienced connectivity issues related to a power outage issue in their US-East-1 Region. This caused customers to have service interruptions in reaching their EC2 instances. Issue was resolved fully by 10:26 AM PST.
2018 May 31 Outage Beginning at 2:52 pm PDT a small percentage of EC2 servers lost power in a single Availability Zone in the US-EAST-1 Region. This resulted in some impaired EC2 instances and degraded performance for some EBS volumes in the affected Availability Zone. Power was restored at 3:22 pm PDT.
2019 August 23 Outage A number of EC2 servers in the Tokyo region shut down due to overheating at 12:36 pm local time, due to a failure in the datacenter control and cooling system.
2019 August 31 Outage and data loss The US-EAST-1 data center suffered a power failure at 4:33 am local time, and the backup generators failed at 6 am. According to AWS, this affected 7.5 percent of the EC2 instances in one of the ten data centers in one of the six Availability Zones in US-EAST-1. However, after restoring power, a number of EBS volumes, which store the filesystems of the EC2 cloud servers, were permanently unrecoverable. This caused downtime for companies such as Reddit.
2019 October 22–23 Service degradation from DDoS AWS sustained a distributed denial of service attack which caused intermittent DNS resolution errors (for their Route 53 DNS service) from 10:30 am PST to 6:30 pm PST.
2020 November 25 Outage Beginning at 9:52 am PST the Kinesis Data Streams API became impaired in the US-EAST-1 Region. This prevented customers from reading or writing data.
2021 December 7 Outage Beginning at 10:45 am PST "an impairment of several network devices" in the US-EAST-1 Region caused widespread errors in all AWS services. The root cause has been mitigated by 4:35 PM PST, but service recovery was still underway causing localized ongoing impairment.
2021 December 15 Outage Region us-west-1 was unavailable for about 30 minutes.
2021 December 22 Outage and potential data loss Power loss in us-east-1 for about 1 hour, followed by extended recovery procedures. AWS attributed the failure to a single availability zone, USE1-AZ4.
2023 June 13 Outage Beginning at 11:49 am PDT, customers experienced increased error rates and latencies for AWS Lambda function invocations within the Northern Virginia (US-EAST-1) Region. By 3:37 pm PDT, the AWS Lambda service and all dependent service resumed normal operations.
2025 October 20 Outage Beginning at 12:11 am PDT, customers experienced increased error rates and latencies for multiple AWS services within the Northern Virginia (US-EAST-1) Region. A number of websites and services such as Roblox, Fortnite, Snapchat, and Duolingo were affected.

by
| | Reply
Post ID: @fp+1k80fm7cg

More layoffs, more us-east-1 outages. Plus critical staff will keep getting poached.

by
| | Reply
Post ID: @e0+1k80fm7cg

12 hour outage, no lawsuits?

by
| | Reply
Post ID: @dw+1k80fm7cg

The news reports said it was fixed, but there are a bunch of services still broken.

by
| | Reply
Post ID: @aw+1k80fm7cg

Everything is down. Banking, gambling, po-n, nothing works.
Never thought that AWS could have an outage of this level.

by
| | Reply
Post ID: @a2+1k80fm7cg

Post a reply

: