CodeQL performs deep static analysis by converting source code into a queryable database and running security queries against it. It detects vulnerabilities by analyzing data flows, control flows, and semantic relationships inside the application. Unlike simple pattern-based tools, CodeQL understands how code behaves. This allows it to detect complex, multi-step vulnerabilities that normal SAST tools miss.
Why CodeQL Matters
CodeQL identifies vulnerabilities by analyzing how user input moves through functions, classes, modules, and data structures. It models your application as a database and lets you write queries to search for unsafe behavior. This provides far deeper insight than pattern scanning. CodeQL is used by GitHub’s global security scanning engine and is capable of catching injection flaws, logic issues, unsafe deserialization, dangerous API usage, and authorization gaps with high accuracy.
How CodeQL Works
Code Extraction
The CodeQL CLI converts your source code into a CodeQL database. This includes parsing, AST generation, semantic indexing, control flow modeling, and data flow mapping.
Query Execution
Queries written in QL search the database for code that matches certain patterns or data flows.
Result Matching
Results show vulnerable code, line locations, file paths, and data flow traces.
Remediation
Developers inspect and fix the vulnerabilities.
CodeQL becomes a core component of shift-left security, automated scanning, and advanced code auditing.
CodeQL Architecture
Source → Database
CodeQL extracts your application into a database containing files, functions, variables, imports, data flows, types, and method calls.
Database → Query
Queries define patterns such as dangerous function usage, unsanitized input flows, or missing validation checks.
Query → Alerts
Each result is a potential vulnerability requiring remediation.
Integration Layers
• Local CLI
• CI/CD pipelines
• GitHub Actions
• Enterprise CodeQL servers
Preparing Code for CodeQL
Clean Codebase
Ensure that dependencies are installed and build steps are correct.
Build Commands
Languages like Java, C++, or C# require full builds to extract semantically rich databases.
Structured Repository
Proper folder layout ensures correct extraction.
Using the CodeQL CLI
Install CodeQL CLI from GitHub releases.
Initialize a database:
codeql database create codeql-db --language=python --source-root=.
Analyze using standard queries:
codeql database analyze codeql-db \
codeql/python-security-extended.qls \
--format=sarif \
--output=results.sarif
View results with SARIF viewers, GitHub Security tab, or dedicated interfaces.
CodeQL Query Packs
Query packs contain curated rule sets.
Examples:
• security-and-quality
• python-security-extended
• javascript-security-and-quality
• cpp-security-extended
• java-code-scanning
Run a query pack:
codeql database analyze codeql-db codeql/javascript-security-and-quality.qls
What CodeQL Detects
Input flows into unsafe sinks
• SQL queries
• Command execution
• Template engines
• File system operations
• Deserialization functions
Missing validation
• No length checks
• Missing type checks
• Missing sanitization
• Unsafe regex patterns
Cryptographic misuse
• Weak hashing
• Weak encryption
• Hardcoded keys
• Predictable randomness
Dangerous APIs
• eval
• os.system
• exec
• subprocess shell execution
• insecure library methods
Logic flaws
• Missing authorization check
• Broken access control
• Incorrect privilege validation
• Insecure branching logic
Data flow bugs
• Tainted input passing through functions
• Unescaped output
• Uncontrolled propagation of user input
Querying With CodeQL (QL Language)
QL is a declarative language similar to SQL. It queries code structure instead of rows in a table.
Basic Query Example: Find all function calls
import python
from FunctionCall fc
select fc, "Function call found"
Find dangerous eval usage
import python
from Call c
where c.getTarget().getName() = "eval"
select c, "Avoid using eval"
Find user input reaching SQL
import python
import semmle.python.security.dataflow.TaintTracking
class UserInput extends TaintSource::Source {}
class SQLExec extends TaintSink::Sink {}
from TaintedFlow tf
where tf.isSource(UserInput()) and tf.isSink(SQLExec())
select tf, "User input reaches SQL execution"
CodeQL automatically traces tainted flow across multiple functions.
Developing Custom CodeQL Queries
Create custom .ql file
import javascript
class DangerousFunction extends Function {
DangerousFunction() { this.getName() = "dangerousRun" }
}
from DangerousFunction f
select f, "Custom rule: avoid dangerousRun()"
Use custom query in CI
codeql database analyze db custom-rules.ql --format=sarif --output=custom.sarif
Best Practices for CodeQL
• Use CodeQL on every pull request
• Use extended security packs
• Write custom queries for internal business logic
• Treat all critical issues as merge blockers
• Store SARIF results for auditing
• Re-run scans on dependency upgrades
• Combine CodeQL with other SAST tools
• Use both taint tracking and control flow rules
Performance Optimization
• Build incremental databases
• Use lightweight configs for PR scanning
• Run full scans nightly
• Exclude vendor directories
• Cache databases in CI
Integration Into CI/CD
GitHub Actions
- uses: github/codeql-action/init@v2
with:
languages: python
- uses: github/codeql-action/analyze@v2
GitLab, Jenkins, Azure DevOps
Use CLI jobs with:
codeql database create
codeql database analyze
Full-Length Practical Section
The following practicals provide hands-on mastery of CodeQL usage, query authoring, scanning, and automation.
Practical 1: Install CodeQL CLI and Create a Database
Download CodeQL bundle.
Create DB:
codeql database create mydb --language=python --source-root=./src
Inspect DB structure:
codeql database run-queries mydb --list-queries
Practical 2: Run Official Security Query Packs
Scan with extended pack:
codeql database analyze mydb \
codeql/python-security-extended.qls \
--format=sarif \
--output=scan.sarif
Examine results for:
• SQL injection
• Path traversal
• Weak hashing
• Unsafe imports
Practical 3: Find Hardcoded Secrets
Write custom query:
import python
from Assign a, Expr e
where a.getRhs() instanceof StringLiteral and
e = a.getRhs()
select a, "Hardcoded string may be a secret"
Run on your codebase.
Practical 4: Detect Dangerous Function Usage
Query:
import python
from Call c
where c.getTarget().getName() = "os.system"
select c, "Avoid os.system"
Fix flagged cases.
Practical 5: Track Tainted Input Into Dangerous Sinks
Use built-in taint tracking library:
import python
import semmle.python.security.dataflow.TaintTracking
from TaintedFlow tf
select tf, "User-controlled input reaches a dangerous sink"
Test with sample vulnerable code.
Practical 6: Write Custom Query for Missing Authorization Checks
Identify functions named “admin” that never call an authorization check:
import python
from Function f
where f.getName().matches("%admin%") and
not exists( Call c |
c.getEnclosingFunction() = f and
c.getTarget().getName() = "check_permissions"
)
select f, "Admin function missing permission check"
Use this for internal audits.
Practical 7: Integrate CodeQL Into GitHub Actions
Workflow:
- uses: github/codeql-action/init@v2
with:
languages: javascript
- uses: github/codeql-action/analyze@v2
Test PR scans.
Practical 8: Run CodeQL Locally on Pull Requests
Run:
codeql database create pr-db --language=python --source-root=.
codeql database analyze pr-db codeql/python-security-extended.qls
Use results in code review.
Practical 9: Add CodeQL to Pre-Merge Gates
Define rule:
• Block merge if CodeQL finds any critical issue
• Warnings allowed but monitored
• SARIF reports uploaded automatically
Practical 10: Detect Unsafe Regex Patterns
Query:
import python
from Call c
where c.getTarget().getName() = "re.compile" and
c.getArgument(0).toString().matches(".*(.*).*")
select c, "Potentially catastrophic backtracking in regex"
Fix unsafe regex patterns.
Practical 11: Detect Unsafe Deserialization
import python
from Call c
where c.getTarget().getName() = "pickle.loads"
select c, "Unsafe deserialization"
Rewrite affected code.
Practical 12: Track Sensitive Data Leaks
Find sensitive variables printed to logs:
import python
class SensitiveVar(Variable):
SensitiveVar() { this.getName().matches("password|token|secret") }
}
from Call c
where c.getTarget().getName() = "print" and
exists(Expr e | e = c.getArgument(0) and e instanceof SensitiveVar)
select c, "Sensitive data printed to logs"
Practical 13: Build Internal Rules for Your Organization
Define rules for:
• No eval
• No os.system
• No insecure crypto
• All SQL must be parameterized
• Only approved HTTP clients
Store queries under /security/codeql-rules/.
Practical 14: Use CodeQL for Secure Code Review
Scan:
codeql database analyze db custom-rules.ql --format=sarif
Review findings manually for accuracy.
Practical 15: Create a CodeQL Rule to Detect Missing Input Validation
import python
from Function f
where f.getName().matches("create|update|process") and
not exists(Call c | c.getEnclosingFunction() = f and c.getTarget().getName() = "validate")
select f, "Function missing validation"
Practical 16: Detect Functions That Return Sensitive Data Without Authorization
import python
class SensitiveField extends Variable {
SensitiveField() { this.getName().matches("email|token|password") }
}
from ReturnStmt r
where exists(SensitiveField s | r.getExpr().toString().matches("%s%".format(s.getName())))
select r, "Sensitive data returned without authorization check"
Practical 17: Build CodeQL Dashboards
Store SARIF for each run and visualize:
• Severity trends
• Issues per module
• Time to fix
• Rule violations
• Team performance
Practical 18: Detect LFI and Path Traversal Issues
import python
from Call c
where c.getTarget().getName() = "open" and
c.getArgument(0).toString().matches("%/../%")
select c, "Potential path traversal"
Practical 19: Scheduled Deep Scans
Create nightly pipeline:
• Full CodeQL scan
• Extended security pack
• Custom rules pack
• Store SARIF and notify team
Practical 20: Build a Complete CodeQL Security Architecture
Include:
• Developer IDE scanning
• CLI local scans
• PR scanning
• CI/CD full scan
• Custom rule packs
• Severity-based gates
• Audit and reporting systems
Use this architecture across all projects.
Intel Dump
• CodeQL converts code into a database for deep static analysis
• It detects vulnerabilities through semantic, control flow, and data flow analysis
• QL queries catch complex taint flows, injection patterns, missing validations, unsafe APIs, and logic flaws
• CodeQL integrates into local workflows, CI/CD, GitHub Actions, PR scans, and enterprise pipelines
• Practicals include database creation, official packs, taint tracking, custom rules, CI integration, secret detection, unsafe regex checks, insecure deserialization, sensitive data leaks, validation enforcement, dashboards, and complete architecture design