Less for More

This year has been a fantastic year to be invested in Alphabet, Amazon and Microsoft. YTD stocks are up 27%, and why not? They’re providing less for more, to you, their customers.

That is, less uptime for ~~the same~~ more money!

Wait more money?
#

I thought cloud providers never raise prices (they do, by the way)?

Sure, fine, agreed dear reader, but… why haven’t they reduced their prices by oh say, a factor of 100, to compensate for the cost reduction in unit economics for servers?

And to be clear: we’re being very generous to the cloud providers, assuming they still want to make tidy earnings increases over the decades, and assuming that they’re still picking up the phone and calling Dell up for their orders (spoiler: they’re not).

Don’t believe me? Here’s a logarithmic graph of compute costs over the period 2006 - 2023:

figure A1 — Our World in Data, “Historical price of computer memory and storage” retrieved 2025-11-05T14:07:48Z

So I ask you again, why are you paying prices that sort-of made sense in 2006, now, 20 years later? Doesn’t your organization deserve to benefit from the order of magnitude cost reductions in compute?

The cloud providers don’t think so, and would very much rather you stop contemplating this. Just be happy that AWS has 40% profit margins this year and don’t think too hard about where they got that money.

The numbers
#

These companies cannot even provide you three nines (99.9%) of availability this year! Two sysadmins and ~5 servers running in a single colo rack can almost certainly provide you and your organization with better availability than that, for far less money; with the neat side benefit that some other random organization’s deployment failure several hundred (or thousand) kilometers away will not affect you at all.

You can do this because you’re probably not a multi-regional organization that has customers using it’s services 24/7, 265 days a year.

Sidebar: what’s availability anyway?
#

The IEEE defines availability as:

Availability is […] a ratio or percentage of the time that the service or service component is actually available for use by the customer to the agreed time that the service should be available.¹

Or put more simply: availability = time available / time supposed to be available

If your users mostly use your services in an 18 hour window say, from UTC-08 to UTC+06 (about 95% of the English speaking world), 5 days a week. That leaves a lot of hours that can be used for maintenance, upgrades, and feature rollouts, without negatively impacting your organization’s availability.

Simply scheduling an out-of-hours maintenance window is something a cloud provider can never do, as they serve everyone, globally; rather than you, personally.

This gives you an almost unbeatable edge when it comes to serving your customers!

Back to the numbers
#

Alphabet’s GCP had a worldwide outage that lasted around 8 hours on June 12th, 2025²
- Impact: Major worldwide Cloudflare disruptions, and… every³ GCP service
Microsoft’s Azure had a worldwide outage for approximately 7 hours, 30 minutes on October 29th, 2025
- Impact: https://login.microsoftonline.com (so, every Microsoft product, including military ones!⁴) and… every important⁵ Azure service, plus Xbox, and oh yes; multiple airlines and airports⁶
Amazon’s AWS had a worldwide outage for over 15 hours (!), on October 19th, 2025
- Impact: AWS DynamoDB, AWS EC2 and AWS NLB, and everything that depended on them… which was everything, internally and externally⁷.

Now, what this loss of uptime for you, specifically, means is dependant on your circumstances.

Maybe you were completely unaffected because you don’t use or rely transitively on any cloud provider
Maybe your website went down, but you don’t run an ecommerce business, so it didn’t really matter
Maybe you lost a day of your employees work (still have to pay them though!)
Maybe you lost a day of sales as a B2C organization
Maybe you lost the ability to do bank transactions in the regulatory required amount of time, and are now going to be asked some… interesting questions by your regulators
Maybe you super heated your customers beds, or bricked their cars and now have to spend years regaining their trust
Maybe you caused global travel disruptions by being unable to get passengers into their airline and to their destination on time
Maybe (this one is hypothetical, so far) your patients died due to malfunctioning medical devices

Probably, you land somewhere between the two extremes. Regardless, your availability is your problem. AWS isn’t going to do a feature freeze during Black Friday because availability is critical, and Microsoft isn’t going temporarily have an “all hands on deck” week during your critical rollout.

They. Don’t. Care. Your customers aren’t going to call and complain to the cloud provider, and they’re not going to be understanding when you say “not our fault, nothing we could do”. They’re going to blame you.

Given that, wouldn’t you rather take control of your availability, while saving a tidy penny?

https://cse.msu.edu/~cse435/Handouts/Standards/IEEE24765.pdf (PDF warning) ↩︎
https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW ↩︎
Okay, not every service it was only: “API Gateway, Agent Assist, AlloyDB for PostgreSQL, Apigee, Apigee Edge Public Cloud, Apigee Hybrid, Cloud Data Fusion, Cloud Firestore, Cloud Logging, Cloud Memorystore, Cloud Monitoring, Cloud Run, Cloud Security Command Center, Cloud Shell, Cloud Spanner, Cloud Workstations, Contact Center AI Platform, Contact Center Insights, Data Catalog, Database Migration Service, Dataform, Dataplex, Dataproc Metastore, Datastream, Dialogflow CX, Dialogflow ES, Google App Engine, Google BigQuery, Google Cloud Bigtable, Google Cloud Composer, Google Cloud Console, Google Cloud DNS, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Pub/Sub, Google Cloud SQL, Google Cloud Storage, Google Compute Engine, Identity Platform, Identity and Access Management, Looker Studio, Managed Service for Apache Kafka, Memorystore for Memcached, Memorystore for Redis, Memorystore for Redis Cluster, Persistent Disk, Personalized Service Health, Pub/Sub Lite, Speech-to-Text, Text-to-Speech, Vertex AI Online Prediction, Vertex AI Search, Vertex Gemini API, Vertex Imagen API, reCAPTCHA Enterprise” :) ↩︎
https://news.ycombinator.com/item?id=45751144 ↩︎
“Azure Front Door, Azure App Service, Azure Active Directory, Azure Communication Services, Azure Databricks, Azure Healthcare APIs, Azure Maps, Azure Portal, Azure SQL Database, Container Registry, Media Services, Microsoft 365, Microsoft Teams, Microsoft Defender External Attack Surface Management, Microsoft Entra ID, Microsoft Purview, Microsoft Sentinel, Video Indexer, and Virtual Desktop” ↩︎
https://www.zdnet.com/article/massive-azure-outage-is-over-but-problems-linger-heres-what-happened/ ↩︎
There are just too many to list here, but some headliners: Signal, Slack, Spotify, Reddit, Fortnite, PlayStation Network, Lloyds Bank, Halifax Bank, Bank of Scotland, Google Drive, Google Meet, Uber, European Financial Services Network ↩︎

Wait more money?#

The numbers#

Sidebar: what’s availability anyway?#

Back to the numbers#

Wait more money?
#

The numbers
#

Sidebar: what’s availability anyway?
#

Back to the numbers
#