Spot and Preemptible Instances

Spot instances use spare cloud capacity at steep discounts (60-90% off on-demand). The tradeoff: the provider can reclaim them with short notice. Designed for fault-tolerant and stateless workloads.

Provider Comparison

FeatureAWS SpotGCP Spot VMsAzure Spot
Discount rangeUp to 90% (variable by instance type)60-91% (fixed discount per type)Up to 90% (variable, set max price)
Termination notice2 minutes30 seconds30 seconds
Max durationNo limit (can run indefinitely)No limit (was 24h, now unlimited)No limit
Pricing modelMarket-based (fluctuates)Fixed discount per machine typeMarket-based (set max price)
Fleet / group supportSpot Fleet, EC2 FleetManaged Instance GroupsVM Scale Sets

Savings Comparison: On-Demand vs Spot

Representative hourly pricing for general-purpose instances. Spot prices are approximate and vary by region and time.

InstancevCPU / RAMOn-Demand $/hrSpot $/hrSavings
m7i.xlarge (AWS)4 vCPU / 16 GB$0.2016$0.060570%
m7i.4xlarge (AWS)16 vCPU / 64 GB$0.8064$0.241970%
c7i.2xlarge (AWS)8 vCPU / 16 GB$0.3570$0.107170%
n2-standard-8 (GCP)8 vCPU / 32 GB$0.3885$0.097275%
D8_v5 (Azure)8 vCPU / 32 GB$0.3840$0.076880%

Spot prices are approximate and fluctuate. GCP Spot VMs have a fixed discount per machine type. AWS and Azure prices vary by supply and demand.

Interruption Handling Strategies

Checkpointing

Periodically save progress to durable storage (S3, GCS). On interruption, new instances resume from the last checkpoint. Essential for long-running batch jobs. Checkpoint interval depends on job cost vs. rework cost.

Diversification

Use multiple instance types and availability zones. AWS Spot Fleet and GCP MIGs with multiple instance templates spread interruption risk. A fleet of 10 different instance types across 3 AZs rarely loses more than 10-20% of capacity at once.

Graceful Shutdown Handlers

Listen for termination notices via instance metadata (AWS: 2 min, GCP: 30s, Azure: 30s). On notice: drain connections, finish current work item, save state, deregister from load balancer. Use SIGTERM handlers in containers.

Mixed On-Demand + Spot

Run a baseline on on-demand or reserved instances (30-40% of capacity) and scale with spot for the remainder. This guarantees minimum capacity while capturing spot savings for elastic demand.

Suitable Workload Patterns

WorkloadSpot SuitabilityNotes
CI/CD pipelinesExcellentShort-lived, idempotent, easy to retry
Batch data processingExcellentPartition work, checkpoint, resume on new instance
ML trainingGoodCheckpoint model weights every N epochs to S3/GCS
Stateless web tierGood (with base)Mix spot with on-demand base; use health checks
DatabasesNot recommendedStateful, interruption causes data loss or corruption risk
Long-running single-threadedNot recommendedInterruption loses all progress without checkpointing