OSINT (Open-Source Intelligence) is the practice of gathering publicly available information about a target without directly interacting with its infrastructure. OSINT supports passive reconnaissance and helps build an intelligence profile before any active testing begins. It reveals digital footprints, infrastructure details, employee data, exposed credentials, third-party associations, and hidden systems that would otherwise remain unseen.
OSINT is one of the most important skills in web pentesting because it uncovers information that tools alone cannot detect. This chapter provides full-length theory and practical step-by-step techniques for applying OSINT efficiently.
Purpose of OSINT in Pentesting
OSINT helps identify external exposure, data leaks, forgotten systems, and behavioral patterns. Attackers heavily depend on OSINT for planning intrusion paths, and pentesters use the same techniques to assess risk.
OSINT supports:
-
Domain profiling
-
Employee intelligence
-
Email pattern discovery
-
Subdomain mapping
-
Technology fingerprinting
-
Credential leak identification
-
Third-party dependency analysis
-
Cloud infrastructure mapping
-
Discovery of undocumented services
OSINT outputs feed directly into active recon, making it a critical early-stage component.
OSINT Categories
OSINT data sources fall into several categories, each revealing a different type of information.
Domain and Infrastructure Intelligence
This includes everything related to the target’s online technical footprint:
-
Domain registrations
-
DNS records
-
IP allocations
-
CDN and cloud providers
-
Historic domain changes
-
Certificate transparency logs
These sources reveal how the organization hosts and manages its web presence.
Human Intelligence (HUMINT)
This focuses on information about employees:
-
Job titles
-
Roles and departments
-
Social media patterns
-
Internal tools mentioned
-
Emails or usernames
-
Leaked credentials on past breaches
Employee intelligence often reveals internal structure and weak entry points.
Technical Metadata Intelligence
Public files contain hidden metadata:
-
Usernames
-
Device names
-
Software versions
-
Internal folder paths
-
Document creation history
Metadata provides internal exposure without touching the target’s systems.
Third-Party Intelligence
Organizations use multiple external services:
-
Payment processors
-
Cloud storage
-
Email platforms
-
CRM tools
-
Helpdesk systems
-
Analytics systems
OSINT helps uncover these third-party dependencies and their potential weaknesses.
Practical OSINT Techniques
OSINT relies on systematic mapping of available information. Below are essential techniques with exact practical steps.
Search Engine Enumeration
Search engines index publicly accessible data. Using advanced search operators, you can uncover information not visible on the main website.
Google Dorks
Identify exposed directories:
site:example.com intitle:"index of"
Find login portals:
site:example.com inurl:login
Discover files:
site:example.com filetype:pdf
site:example.com filetype:docx
site:example.com filetype:xls
Find staging environments:
site:*.example.com -www.example.com
These queries often reveal endpoints and files that are not linked from the main site.
Certificate Transparency OSINT
CT logs record all SSL certificates issued for a domain. They often expose internal subdomains.
Search:
https://crt.sh/?q=example.com
or use:
subfinder -d example.com
CT logs commonly uncover:
-
Development subdomains
-
Staging portals
-
Internal API endpoints
These discoveries form the basis for deeper enumeration.
Passive DNS OSINT
Passive DNS platforms collect DNS records historically.
Useful services include:
-
SecurityTrails
-
DNSDB
-
VirusTotal DNS
-
PassiveTotal
Search for:
-
Past subdomains
-
Old IP addresses
-
Retired infrastructure
These records reveal what the company used in the past and may still have exposed.
Public Document OSINT
Public files often leak internal information.
Download a PDF and inspect metadata:
exiftool document.pdf
Metadata reveals:
-
Author name
-
Device name
-
Software version
-
Timestamp
-
Internal directories
Office documents often expose internal file paths used during creation.
GitHub and Public Repo OSINT
Public repositories are one of the most sensitive OSINT sources. Companies often push internal code accidentally.
Search GitHub:
org:example
or:
"example.com" filename:config
or:
"password" repo:example
Look for:
-
API keys
-
Credentials
-
Environment variables
-
Internal comments
-
Deprecated scripts
If a company does not have an official GitHub organization, employees may still push internal code to personal accounts.
Employee Enumeration
Use LinkedIn to enumerate employees:
Search for:
site:linkedin.com "example.com"
Collect:
-
Full names
-
Departments
-
Job roles
-
Email patterns
Typical email pattern discovery:
firstname.lastname@example.com
This helps during username enumeration and password spraying simulations.
Email Breach OSINT
Tools such as:
-
HaveIBeenPwned
-
Dehashed
-
LeakCheck
-
Snusbase
reveal leaked credentials attached to company emails.
Example search:
user@example.com
Look for:
-
Password reuse patterns
-
Historical passwords
-
Email presence in multiple breaches
These leaks guide authentication attacks in later chapters.
Social Media OSINT
Employees often unintentionally reveal internal info.
Look for:
-
Screenshots containing dashboards
-
Mentions of software used internally
-
Technology announcements
-
Job posting requirements
Example:
A job listing mentioning “Docker, Kubernetes, and Django” reveals the backend stack.
Public Source Code OSINT
Google GitHub dorks:
site:github.com "example.com"
Look for:
-
Old repositories
-
Internal scripts
-
Environment files
Developers frequently leak infrastructure data unintentionally.
Business and Legal Document OSINT
Public company filings may expose:
-
Internal addresses
-
Administrative contact names
-
Legal representatives
-
Email patterns
Government portals and compliance sites often host PDFs with metadata.
Cloud Storage OSINT
Identify misconfigured cloud storage buckets through naming conventions.
Common cloud bucket patterns:
-
example.s3.amazonaws.com
-
static.example.com
-
storage.googleapis.com/example
-
example.azureedge.net
Check bucket accessibility:
curl http://example.s3.amazonaws.com
Misconfigured buckets can expose:
-
Private files
-
Backups
-
Logs
-
API keys
-
Source code
OSINT Automation Tools
OSINT can also be automated to streamline collection.
Useful tools:
theHarvester
theHarvester -d example.com -b all
Collects:
-
Emails
-
Subdomains
-
Hosts
-
Public records
Amass (Intelligence Mode)
amass intel -d example.com
Maltego
Visual mapping for:
-
Employees
-
Domains
-
DNS
-
Infrastructure
-
Social media
Recon-ng
A modular recon framework for:
-
Credential breaches
-
Subdomain discovery
-
Info scraping
OSINT automation reduces manual work and consolidates results.
Organizing OSINT Data
Organize collected data to support later phases.
Create folders:
-
employees
-
domains
-
subdomains
-
leaks
-
documents
-
infrastructure
Store findings in separate files:
-
emails.txt
-
subdomains_passive.txt
-
leaks.txt
-
github_results.txt
-
metadata.txt
Organized OSINT becomes the foundation for active recon and exploitation.
Integrating OSINT Into Pentesting
OSINT data directly feeds into multiple stages:
-
Subdomain enumeration
-
DNS mapping
-
Email attack surface
-
Cloud resource enumeration
-
API discovery
-
Authentication testing
-
Technology fingerprinting
A well-executed OSINT phase reveals more attack surfaces than any automatic scanner.
Intel Dump
-
OSINT gathers public information without touching target systems.
-
Use search engines, CT logs, passive DNS, archives, and metadata.
-
Analyze GitHub and public repositories for leaks.
-
Enumerate employees through social networks and job listings.
-
Search breach databases for leaked credentials.
-
Inspect public documents using metadata extraction tools.
-
Identify cloud storage buckets and third-party dependencies.
-
Use OSINT automation tools like theHarvester, Amass, and Recon-ng.
-
Organize results to support deeper recon and exploitation.