Spot and Preemptible Instances
Spot instances use spare cloud capacity at steep discounts (60-90% off on-demand). The tradeoff: the provider can reclaim them with short notice. Designed for fault-tolerant and stateless workloads.
Provider Comparison
| Feature | AWS Spot | GCP Spot VMs | Azure Spot |
|---|---|---|---|
| Discount range | Up to 90% (variable by instance type) | 60-91% (fixed discount per type) | Up to 90% (variable, set max price) |
| Termination notice | 2 minutes | 30 seconds | 30 seconds |
| Max duration | No limit (can run indefinitely) | No limit (was 24h, now unlimited) | No limit |
| Pricing model | Market-based (fluctuates) | Fixed discount per machine type | Market-based (set max price) |
| Fleet / group support | Spot Fleet, EC2 Fleet | Managed Instance Groups | VM Scale Sets |
Savings Comparison: On-Demand vs Spot
Representative hourly pricing for general-purpose instances. Spot prices are approximate and vary by region and time.
| Instance | vCPU / RAM | On-Demand $/hr | Spot $/hr | Savings |
|---|---|---|---|---|
| m7i.xlarge (AWS) | 4 vCPU / 16 GB | $0.2016 | $0.0605 | 70% |
| m7i.4xlarge (AWS) | 16 vCPU / 64 GB | $0.8064 | $0.2419 | 70% |
| c7i.2xlarge (AWS) | 8 vCPU / 16 GB | $0.3570 | $0.1071 | 70% |
| n2-standard-8 (GCP) | 8 vCPU / 32 GB | $0.3885 | $0.0972 | 75% |
| D8_v5 (Azure) | 8 vCPU / 32 GB | $0.3840 | $0.0768 | 80% |
Spot prices are approximate and fluctuate. GCP Spot VMs have a fixed discount per machine type. AWS and Azure prices vary by supply and demand.
Interruption Handling Strategies
Checkpointing
Periodically save progress to durable storage (S3, GCS). On interruption, new instances resume from the last checkpoint. Essential for long-running batch jobs. Checkpoint interval depends on job cost vs. rework cost.
Diversification
Use multiple instance types and availability zones. AWS Spot Fleet and GCP MIGs with multiple instance templates spread interruption risk. A fleet of 10 different instance types across 3 AZs rarely loses more than 10-20% of capacity at once.
Graceful Shutdown Handlers
Listen for termination notices via instance metadata (AWS: 2 min, GCP: 30s, Azure: 30s). On notice: drain connections, finish current work item, save state, deregister from load balancer. Use SIGTERM handlers in containers.
Mixed On-Demand + Spot
Run a baseline on on-demand or reserved instances (30-40% of capacity) and scale with spot for the remainder. This guarantees minimum capacity while capturing spot savings for elastic demand.
Suitable Workload Patterns
| Workload | Spot Suitability | Notes |
|---|---|---|
| CI/CD pipelines | Excellent | Short-lived, idempotent, easy to retry |
| Batch data processing | Excellent | Partition work, checkpoint, resume on new instance |
| ML training | Good | Checkpoint model weights every N epochs to S3/GCS |
| Stateless web tier | Good (with base) | Mix spot with on-demand base; use health checks |
| Databases | Not recommended | Stateful, interruption causes data loss or corruption risk |
| Long-running single-threaded | Not recommended | Interruption loses all progress without checkpointing |