Incident response in cloud environments is fundamentally different from traditional on-premises investigations. Cloud systems are API-driven, distributed, and highly automated—meaning evidence lives in logs, managed services, and virtual infrastructure rather than physical hardware. Effective cloud incident response requires understanding shared responsibility, cloud-native tooling, and the unique challenges of ephemeral resources and identity-based attacks.
This chapter explains how incident response works in AWS, Azure, and GCP, the stages of cloud-focused IR, and the techniques investigators use to contain, analyze, and eradicate threats in cloud systems.
Why Cloud Incident Response Is Different
Cloud environments introduce unique challenges:
-
No physical access to servers or disks
-
Ephemeral resources (VMs, containers, functions) vanish quickly
-
Attacks often target identities, not machines
-
Logging must be pre-enabled or evidence may be missing
-
Cross-region and cross-account compromise is common
-
Data exfiltration happens via APIs, not OS-level tools
Incident responders must rely heavily on cloud-native evidence sources.
Cloud Shared Responsibility Model (Critical for IR)
Cloud Provider (AWS/Azure/GCP) responsibility:
-
Physical security
-
Hardware
-
Hypervisors
-
Core networking
Customer responsibility:
-
IAM configuration
-
Logging
-
Encryption
-
Application security
-
Data protection
-
OS-level hardening
This means misconfigurations by the customer are the most common cause of cloud breaches.
Key Phases of Cloud Incident Response
Cloud IR follows the standard IR lifecycle but uses cloud-specific techniques.
1. Detection & Identification
Identify unusual activity, such as:
-
Suspicious API calls
-
Strange login locations
-
Data access anomalies
-
Cloud workload execution spikes
-
New IAM users/keys
-
Unusual outbound traffic (C2)
-
Storage downloads
Primary detection sources:
-
AWS GuardDuty
-
Azure Security Center
-
GCP Security Command Center
-
SIEM alerts (Splunk, Sentinel, Elastic)
-
CloudTrail / Azure Activity / GCP Audit Logs
2. Investigation & Evidence Collection
Cloud investigations require gathering:
API logs
-
CloudTrail (AWS)
-
Azure Activity Logs
-
GCP Audit Logs
Identity logs
-
AWS IAM Access Analyzer
-
Azure AD Sign-ins
-
GCP IAM Recommender
Storage logs
-
S3/Blob/Cloud Storage access logs
Network logs
-
VPC/NSG/Firewall flow logs
Instance evidence
-
Snapshot disks
-
Memory (if VM still active)
-
Container logs
-
Lambda/Function logs
Time is critical—cloud resources may auto-terminate.
3. Containment
Containment techniques in cloud environments include:
Identity Containment
-
Disable compromised access keys
-
Rotate credentials
-
Remove newly created users
-
Block suspicious IPs
-
Revoke OAuth tokens
-
Enforce MFA
Network Containment
-
Update security groups
-
Block outbound connections
-
Restrict VPC peering
-
Disable open ports
Resource Containment
-
Isolate compromised VM by:
-
Removing from load balancers
-
Changing SGs to “deny all”
-
Capturing snapshots before shutdown
-
Storage Containment
-
Lock down public buckets
-
Disable SAS tokens (Azure)
-
Block cross-account access
Containment is reversible and preserves evidence.
4. Eradication
Remove attacker presence:
-
Delete malicious IAM policies
-
Remove rogue service accounts
-
Stop unauthorized tasks/functions
-
Cleanup malware in VMs or containers
-
Remove public access from storage
-
Reset misconfigured firewall/security group rules
-
Delete unauthorized snapshots or images
Ensure no persistence remains in:
-
IAM
-
Serverless functions
-
EventBridge/CloudWatch events
-
Cron jobs (inside VMs)
-
Launch templates
-
Instance metadata scripts
5. Recovery
Restore systems to secure state:
-
Redeploy workloads from clean AMIs/Images
-
Regenerate IAM keys
-
Validate security group rules
-
Re-enable logging
-
Patch vulnerabilities
-
Rebuild containers from source
Also ensure attacker backdoors are eliminated.
6. Post-Incident Review
Perform a full cloud-focused lessons-learned analysis:
-
What IAM roles were abused?
-
What misconfigurations allowed the attack?
-
Which logs were missing?
-
How could automation improve detection?
-
What guardrails should be added?
This step helps strengthen the architecture.
Cloud-Specific Incident Response Techniques
1. Auto-Snapshotting & Evidence Preservation
Before shutting down a compromised VM:
-
Snapshot EBS (AWS) / Managed Disk (Azure) / Persistent Disk (GCP)
-
Export instance logs
-
Preserve cloud function logs
-
Archive API logs
Snapshots allow forensic imaging later.
2. Serverless / Function IR
Investigate:
-
CloudWatch logs
-
Azure Function logs
-
GCP Cloud Functions logs
-
IAM execution role permissions
-
Trigger events (S3, Pub/Sub, EventBridge)
Attackers often deploy malicious serverless functions for persistence.
3. Container / Kubernetes IR
Inspect:
-
Pod logs
-
Kube-Audit logs
-
Node snapshots
-
Container registry logs
-
Unexpected deployments or images
Compromised containers spread quickly across clusters.
4. IAM-Centric Investigation
Most cloud breaches involve:
-
Stolen access keys
-
Over-permissive IAM roles
-
Misconfigured token access
-
Account takeover
Analyze:
-
Key usage
-
Role switching
-
OAuth token issuance
-
MFA bypass attempts
5. Cross-Region & Cross-Account Attacks
Attackers may hide activity in:
-
Non-default regions
-
Separate AWS accounts
-
Additional subscriptions/projects
Investigators must check all regions and all accounts.
Common Cloud Attack Patterns (for IR)
-
S3 bucket enumeration → mass downloads
-
Privilege escalation via IAM misconfiguration
-
Deploying crypto-mining instances
-
Creating persistence using IAM users or functions
-
Exfiltration using CloudFront or signed URLs
-
Deleting or modifying CloudTrail logs
-
Access key theft via public GitHub repos
These patterns guide response strategy.
Best Practices for Cloud Incident Response
-
Enable logging everywhere (CloudTrail, Flow Logs, Storage Logs)
-
Use MFA for all high-privilege accounts
-
Rotate and disable idle access keys
-
Implement least privilege IAM
-
Create separate production & investigation accounts
-
Use SIEM integrations (Chronicle, Sentinel, Splunk)
-
Pre-build IR playbooks specific to cloud environments
-
Monitor for unusual API actions
-
Use guardrails: SCPs, Azure Policies, GCP Organization Policies
Intel Dump
-
Cloud incident response relies on API logs, identity logs, storage logs, and network flow logs—because there is no physical evidence.
-
Key activities include detection, evidence collection, containment, eradication, recovery, and post-mortem analysis.
-
Cloud attacks commonly target IAM for privilege escalation and storage for data theft.
-
Responders must quickly snapshot VMs, isolate resources, revoke access keys, restrict security groups, and review cross-region activity.
-
Effective IR requires proper logging, MFA enforcement, least-privilege IAM, and continual monitoring with cloud-native security tools.