CodeQL

CodeQL performs deep static analysis by converting source code into a queryable database and running security queries against it. It detects vulnerabilities by analyzing data flows, control flows, and semantic relationships inside the application. Unlike simple pattern-based tools, CodeQL understands how code behaves. This allows it to detect complex, multi-step vulnerabilities that normal SAST tools miss.

Why CodeQL Matters

CodeQL identifies vulnerabilities by analyzing how user input moves through functions, classes, modules, and data structures. It models your application as a database and lets you write queries to search for unsafe behavior. This provides far deeper insight than pattern scanning. CodeQL is used by GitHub’s global security scanning engine and is capable of catching injection flaws, logic issues, unsafe deserialization, dangerous API usage, and authorization gaps with high accuracy.

How CodeQL Works

Code Extraction

The CodeQL CLI converts your source code into a CodeQL database. This includes parsing, AST generation, semantic indexing, control flow modeling, and data flow mapping.

Query Execution

Queries written in QL search the database for code that matches certain patterns or data flows.

Result Matching

Results show vulnerable code, line locations, file paths, and data flow traces.

Remediation

Developers inspect and fix the vulnerabilities.

CodeQL becomes a core component of shift-left security, automated scanning, and advanced code auditing.


CodeQL Architecture

Source → Database

CodeQL extracts your application into a database containing files, functions, variables, imports, data flows, types, and method calls.

Database → Query

Queries define patterns such as dangerous function usage, unsanitized input flows, or missing validation checks.

Query → Alerts

Each result is a potential vulnerability requiring remediation.

Integration Layers

• Local CLI
• CI/CD pipelines
• GitHub Actions
• Enterprise CodeQL servers


Preparing Code for CodeQL

Clean Codebase

Ensure that dependencies are installed and build steps are correct.

Build Commands

Languages like Java, C++, or C# require full builds to extract semantically rich databases.

Structured Repository

Proper folder layout ensures correct extraction.


Using the CodeQL CLI

Install CodeQL CLI from GitHub releases.

Initialize a database:

codeql database create codeql-db --language=python --source-root=.

Analyze using standard queries:

codeql database analyze codeql-db \
  codeql/python-security-extended.qls \
  --format=sarif \
  --output=results.sarif

View results with SARIF viewers, GitHub Security tab, or dedicated interfaces.


CodeQL Query Packs

Query packs contain curated rule sets.

Examples:

security-and-quality
python-security-extended
javascript-security-and-quality
cpp-security-extended
java-code-scanning

Run a query pack:

codeql database analyze codeql-db codeql/javascript-security-and-quality.qls

What CodeQL Detects

Input flows into unsafe sinks

• SQL queries
• Command execution
• Template engines
• File system operations
• Deserialization functions

Missing validation

• No length checks
• Missing type checks
• Missing sanitization
• Unsafe regex patterns

Cryptographic misuse

• Weak hashing
• Weak encryption
• Hardcoded keys
• Predictable randomness

Dangerous APIs

• eval
• os.system
• exec
• subprocess shell execution
• insecure library methods

Logic flaws

• Missing authorization check
• Broken access control
• Incorrect privilege validation
• Insecure branching logic

Data flow bugs

• Tainted input passing through functions
• Unescaped output
• Uncontrolled propagation of user input


Querying With CodeQL (QL Language)

QL is a declarative language similar to SQL. It queries code structure instead of rows in a table.

Basic Query Example: Find all function calls

import python

from FunctionCall fc
select fc, "Function call found"

Find dangerous eval usage

import python

from Call c
where c.getTarget().getName() = "eval"
select c, "Avoid using eval"

Find user input reaching SQL

import python
import semmle.python.security.dataflow.TaintTracking

class UserInput extends TaintSource::Source {}
class SQLExec extends TaintSink::Sink {}

from TaintedFlow tf
where tf.isSource(UserInput()) and tf.isSink(SQLExec())
select tf, "User input reaches SQL execution"

CodeQL automatically traces tainted flow across multiple functions.


Developing Custom CodeQL Queries

Create custom .ql file

import javascript

class DangerousFunction extends Function {
  DangerousFunction() { this.getName() = "dangerousRun" }
}

from DangerousFunction f
select f, "Custom rule: avoid dangerousRun()"

Use custom query in CI

codeql database analyze db custom-rules.ql --format=sarif --output=custom.sarif

Best Practices for CodeQL

• Use CodeQL on every pull request
• Use extended security packs
• Write custom queries for internal business logic
• Treat all critical issues as merge blockers
• Store SARIF results for auditing
• Re-run scans on dependency upgrades
• Combine CodeQL with other SAST tools
• Use both taint tracking and control flow rules


Performance Optimization

• Build incremental databases
• Use lightweight configs for PR scanning
• Run full scans nightly
• Exclude vendor directories
• Cache databases in CI


Integration Into CI/CD

GitHub Actions

- uses: github/codeql-action/init@v2
  with:
    languages: python

- uses: github/codeql-action/analyze@v2

GitLab, Jenkins, Azure DevOps

Use CLI jobs with:

codeql database create
codeql database analyze

Full-Length Practical Section

The following practicals provide hands-on mastery of CodeQL usage, query authoring, scanning, and automation.


Practical 1: Install CodeQL CLI and Create a Database

Download CodeQL bundle.
Create DB:

codeql database create mydb --language=python --source-root=./src

Inspect DB structure:

codeql database run-queries mydb --list-queries

Practical 2: Run Official Security Query Packs

Scan with extended pack:

codeql database analyze mydb \
  codeql/python-security-extended.qls \
  --format=sarif \
  --output=scan.sarif

Examine results for:

• SQL injection
• Path traversal
• Weak hashing
• Unsafe imports


Practical 3: Find Hardcoded Secrets

Write custom query:

import python

from Assign a, Expr e
where a.getRhs() instanceof StringLiteral and
      e = a.getRhs()
select a, "Hardcoded string may be a secret"

Run on your codebase.


Practical 4: Detect Dangerous Function Usage

Query:

import python

from Call c
where c.getTarget().getName() = "os.system"
select c, "Avoid os.system"

Fix flagged cases.


Practical 5: Track Tainted Input Into Dangerous Sinks

Use built-in taint tracking library:

import python
import semmle.python.security.dataflow.TaintTracking

from TaintedFlow tf
select tf, "User-controlled input reaches a dangerous sink"

Test with sample vulnerable code.


Practical 6: Write Custom Query for Missing Authorization Checks

Identify functions named “admin” that never call an authorization check:

import python

from Function f
where f.getName().matches("%admin%") and
      not exists( Call c |
          c.getEnclosingFunction() = f and
          c.getTarget().getName() = "check_permissions"
      )
select f, "Admin function missing permission check"

Use this for internal audits.


Practical 7: Integrate CodeQL Into GitHub Actions

Workflow:

- uses: github/codeql-action/init@v2
  with:
    languages: javascript
- uses: github/codeql-action/analyze@v2

Test PR scans.


Practical 8: Run CodeQL Locally on Pull Requests

Run:

codeql database create pr-db --language=python --source-root=.
codeql database analyze pr-db codeql/python-security-extended.qls

Use results in code review.


Practical 9: Add CodeQL to Pre-Merge Gates

Define rule:

• Block merge if CodeQL finds any critical issue
• Warnings allowed but monitored
• SARIF reports uploaded automatically


Practical 10: Detect Unsafe Regex Patterns

Query:

import python

from Call c
where c.getTarget().getName() = "re.compile" and
      c.getArgument(0).toString().matches(".*(.*).*")
select c, "Potentially catastrophic backtracking in regex"

Fix unsafe regex patterns.


Practical 11: Detect Unsafe Deserialization

import python

from Call c
where c.getTarget().getName() = "pickle.loads"
select c, "Unsafe deserialization"

Rewrite affected code.


Practical 12: Track Sensitive Data Leaks

Find sensitive variables printed to logs:

import python

class SensitiveVar(Variable):
  SensitiveVar() { this.getName().matches("password|token|secret") }
}

from Call c
where c.getTarget().getName() = "print" and
      exists(Expr e | e = c.getArgument(0) and e instanceof SensitiveVar)
select c, "Sensitive data printed to logs"

Practical 13: Build Internal Rules for Your Organization

Define rules for:

• No eval
• No os.system
• No insecure crypto
• All SQL must be parameterized
• Only approved HTTP clients

Store queries under /security/codeql-rules/.


Practical 14: Use CodeQL for Secure Code Review

Scan:

codeql database analyze db custom-rules.ql --format=sarif

Review findings manually for accuracy.


Practical 15: Create a CodeQL Rule to Detect Missing Input Validation

import python

from Function f
where f.getName().matches("create|update|process") and
      not exists(Call c | c.getEnclosingFunction() = f and c.getTarget().getName() = "validate")
select f, "Function missing validation"

Practical 16: Detect Functions That Return Sensitive Data Without Authorization

import python

class SensitiveField extends Variable {
  SensitiveField() { this.getName().matches("email|token|password") }
}

from ReturnStmt r
where exists(SensitiveField s | r.getExpr().toString().matches("%s%".format(s.getName())))
select r, "Sensitive data returned without authorization check"

Practical 17: Build CodeQL Dashboards

Store SARIF for each run and visualize:

• Severity trends
• Issues per module
• Time to fix
• Rule violations
• Team performance


Practical 18: Detect LFI and Path Traversal Issues

import python

from Call c
where c.getTarget().getName() = "open" and
      c.getArgument(0).toString().matches("%/../%")
select c, "Potential path traversal"

Practical 19: Scheduled Deep Scans

Create nightly pipeline:

• Full CodeQL scan
• Extended security pack
• Custom rules pack
• Store SARIF and notify team


Practical 20: Build a Complete CodeQL Security Architecture

Include:

• Developer IDE scanning
• CLI local scans
• PR scanning
• CI/CD full scan
• Custom rule packs
• Severity-based gates
• Audit and reporting systems

Use this architecture across all projects.


Intel Dump

• CodeQL converts code into a database for deep static analysis
• It detects vulnerabilities through semantic, control flow, and data flow analysis
• QL queries catch complex taint flows, injection patterns, missing validations, unsafe APIs, and logic flaws
• CodeQL integrates into local workflows, CI/CD, GitHub Actions, PR scans, and enterprise pipelines
• Practicals include database creation, official packs, taint tracking, custom rules, CI integration, secret detection, unsafe regex checks, insecure deserialization, sensitive data leaks, validation enforcement, dashboards, and complete architecture design

HOME LEARN COMMUNITY DASHBOARD