Root Cause Analysis

Root Cause Analysis (RCA) is the process of identifying how an incident started, what enabled it, and what allowed the attacker or malware to succeed.
RCA determines the initial point of failure, not just the symptoms.
A SOC analyst must find the exact weakness—misconfiguration, user action, vulnerability, or missing detection—that allowed the incident to occur.

This chapter explains RCA in full-scale SOC depth, focusing on log evidence, investigative workflow, attacker timelines, and real SOC case studies.


What Root Cause Analysis Really Means

RCA answers the most important questions in incident response:

  • How did the attacker get in?

  • What allowed the attack to succeed?

  • What vulnerability or mistake was exploited?

  • Could this have been prevented?

  • What detection or control failed?

  • Is the root cause fixed so it won’t happen again?

RCA goes beyond identifying malware or suspicious behavior.
It digs into the origin, not the symptoms.


When RCA Happens in the Incident Lifecycle

RCA occurs after containment and eradication, during the deeper investigation led by L2/L3 analysts.

It relies on:

  • Host logs

  • Network logs

  • EDR telemetry

  • Forensics

  • User activity history

  • Configuration audits

RCA ensures complete understanding of the incident's beginning.


Core Components of Root Cause Analysis

1. Trigger Event Identification

Identify the exact moment where malicious activity started.

Example:

User clicked phishing URL at 10:32 UTC → dropper.ps1 downloaded

This event starts the entire attack timeline.


2. Initial Access Vector

Determine how the attacker first entered.

Common vectors:

  • Phishing

  • Vulnerable public-facing apps

  • Weak passwords

  • Exposed RDP

  • Stolen credentials

  • Misconfigured cloud resources

Example:

Brute force → successful SSH login

3. Execution Path

Identify what code executed and how.

Examples:

  • Encoded PowerShell

  • MS Office macro execution

  • Linux shell script in /tmp

  • DLL side-loading

Example:

WINWORD.exe → powershell.exe → payload.exe

4. Privilege Escalation Cause

Identify how the attacker obtained elevated rights.

Examples:

  • Sudo misconfiguration

  • Token theft

  • Kerberoasting

  • Exploits

Example:

sudoers file allowed user to execute bash without password

5. Persistence Mechanism

Determine how the attacker maintained access.

Examples:

  • Registry Run keys

  • Cron job

  • Scheduled task

  • Malicious service

Example:

schtasks /tn "Updater" /tr C:\Users\Public\bd.exe

6. Detection Failure Analysis

Find out why detection didn’t fire earlier.

Reasons:

  • Logging disabled

  • Rule too broad or too narrow

  • Telemetry missing

  • Threat was unknown

  • Tool malfunction

Example:

Sysmon not installed → no process creation logs

7. Control Weakness Identification

Identify which security control failed.

Examples:

  • Missing patches

  • Weak firewall rules

  • No MFA

  • Unrestricted outbound connections

  • Unmonitored DNS traffic

Example:

Public-facing Tomcat server unpatched for 8 months

8. Final Root Cause Statement

A one-sentence explanation summarizing the true cause.

Example:

Root Cause: User executed a malicious macro from a phishing email, which downloaded a payload due to lack of attachment filtering and insufficient PowerShell restrictions.

Practical RCA Workflow (SOC-Level)

Below is the exact workflow L2/L3 analysts follow.


Step 1 — Validate Timeline Start

Identify earliest suspicious action:

10:32 – User clicked phishing link
10:33 – payload.ps1 downloaded
10:34 – C2 communication established

The first suspicious event becomes the starting point.


Step 2 — Identify Attack Vector

Using logs:

  • Proxy logs → malicious URL

  • Email logs → phishing email

  • Firewall logs → inbound traffic

  • Authentication logs → brute force success

Example:

Email attachment triggered macro → malicious script executed

Step 3 — Reconstruct Execution Chain

Using Sysmon and Linux logs:

WINWORD.exe → powershell.exe → curl → payload.exe

OR

/tmp/bd.sh executed → created miner binary

Attack path shows how malware ran.


Step 4 — Determine Privilege Escalation

Check for:

  • sudo

  • exploitation

  • credential dumping

  • AD misconfigurations

Example:

4672 — Special privileges assigned to compromised user

Step 5 — Identify Lateral Movement

Firewall + Windows auth logs:

4624 LogonType 3 from infected host

Network logs:

SMB connection to file server

Step 6 — Persistence Review

Check:

  • Registry Run keys

  • Cron jobs

  • Services

  • Scheduled tasks

Example:

HKCU\Software\Microsoft\Windows\Run → updater.exe

Step 7 — Identify Control Failures

Examples:

  • No EDR on machine

  • SIEM rule too weak

  • Lack of network segmentation

  • Unrestricted outbound traffic

  • No MFA on admin accounts


Step 8 — Deliver Root Cause Statement

Final deliverable includes:

  • Trigger event

  • Attack vector

  • Failed control

  • Weakness exploited

  • What allowed escalation

  • How to prevent recurrence


Real SOC RCA Examples

Example 1 — Malware Infection from Phishing

Findings:

  • User opened malicious Word doc

  • Macro executed PowerShell

  • Downloaded payload

  • C2 communication established

  • No EDR installed

  • PowerShell logging disabled

Root Cause:

Phishing email led to macro execution due to inadequate email filtering and insufficient PowerShell restrictions.

Example 2 — SSH Brute Force → Server Compromise

Findings:

  • Public SSH exposed

  • Password-based auth enabled

  • Weak password

  • Attacker brute-forced credentials

  • Installed crypto miner

Root Cause:

Weak SSH password and lack of brute force protection allowed unauthorized access.

Example 3 — Lateral Movement in Windows Domain

Findings:

  • User credentials stolen via LSASS dumping

  • No credential guard

  • Attacker used valid credentials

  • Moved through SMB and WinRM

Root Cause:

LSASS memory exposure due to lack of endpoint hardening enabled credential theft and lateral movement.

Example 4 — Cloud Misconfiguration

Findings:

  • S3 bucket misconfigured as public

  • Data exposed externally

  • No IAM policy restrictions

  • No monitoring

Root Cause:

Public S3 bucket misconfiguration caused unauthorized external access.

Analyst Workflow for RCA

  1. Collect all logs (endpoint + network + cloud)

  2. Identify earliest malicious event

  3. Determine entry point

  4. Reconstruct execution chain

  5. Identify privilege escalation

  6. Identify lateral movement

  7. Identify persistence mechanisms

  8. Determine detection failures

  9. Identify configuration or policy gaps

  10. Finalize root cause statement

A thorough RCA prevents repeat incidents.


Intel Dump

  • RCA identifies the true origin of an attack, not just symptoms.

  • It requires reconstructing the full timeline from the earliest malicious event.

  • RCA includes initial access, execution, escalation, persistence, lateral movement, and detection failure analysis.

  • Common root causes include phishing, weak passwords, unpatched systems, misconfigurations, missing logging, and lack of segmentation.

  • RCA ends with a clear statement: what caused the incident and how to prevent it in the future.

HOME LEARN COMMUNITY DASHBOARD