Input validation and sanitization ensure that all external data entering a system is safe, expected, and correctly formatted. These two practices prevent injection attacks, logic manipulation, broken authorization paths, and unexpected behavior caused by malicious or malformed input. Input validation protects your system before untrusted data interacts with code, queries, files, infrastructure, or business logic.
Why Input Validation Matters
Most attacks begin with unsafe input. Attackers manipulate parameters, headers, payloads, URLs, cookies, and file uploads to exploit flaws. Without strong validation, harmful input reaches backend logic or underlying systems. Input validation acts as the first line of defense by rejecting invalid or malicious data before any processing occurs.
Systems that lack proper validation become vulnerable to:
• SQL injection
• Command injection
• LDAP injection
• XSS
• File path traversal
• Overflow attacks
• Broken object-level authorization
• Denial of service through oversized input
• Logic bypass attacks
Input validation reduces these risks significantly.
Core Principles of Input Validation
Never Trust User Input
All external data should be considered hostile until validated. This includes input from clients, APIs, mobile devices, IoT, webhooks, and third-party integrations.
Validate at the Server Side
Client-side validation improves usability but cannot be trusted. Attackers bypass it easily. All validation must happen server-side.
Whitelist Instead of Blacklist
Define what is allowed, not what is disallowed. Whitelisting reduces chances of missing harmful patterns.
Validate Before Processing
Validate before interacting with databases, files, logic, or configurations.
Reject Early
Reject malformed input as soon as it enters the system. Do not pass it deeper into backend workflows.
Encode or Sanitize Output Separately
Validation and output encoding are different tasks. Validation checks data correctness; encoding prevents interpretation as code.
Limit Input Size
Avoid oversized or excessively long data. Limit input size to prevent DoS and memory exhaustion.
Type Check Everything
Ensure numbers are numbers, booleans are booleans, emails are emails, and objects follow expected schemas.
Validate Structurally Complex Data
Validate structured data such as JSON, XML, multipart forms, and file uploads.
Types of Input Validation
Format Validation
Check if input matches patterns such as email, username, or phone number.
re.fullmatch(r"[a-zA-Z0-9_]{3,20}", username)
Type Validation
Ensure integers, floats, and booleans are truly of the correct type.
Length Validation
Strings must fall within defined min-max limits.
Range Validation
Numeric inputs must stay within secure boundaries.
Allowed Characters Validation
Reject characters outside the safe set.
Schema Validation
Use JSON schema validators to enforce structure and types.
File Validation
Check file type, extension, MIME type, size, and content signatures.
Deep Dive on Sanitization
Sanitization prepares input for safe processing. It modifies, escapes, or transforms data into a safe form.
Output Sanitization
Used for HTML, XML, JSON, and UI output to prevent XSS or injection.
Command Sanitization
Avoid passing raw data into system commands. Use safe APIs.
File Path Sanitization
Normalize file paths and enforce safe directories.
safe_path = os.path.join(BASE_DIR, os.path.basename(user_path))
Validation + Sanitization Pipeline
A secure pipeline for all input:
-
Accept input
-
Check type
-
Check length
-
Check allowed characters
-
Match pattern or schema
-
Reject if invalid
-
Sanitize if needed
-
Encode before output
-
Process business logic safely
High-Risk Inputs That Require Strict Validation
• Usernames
• Passwords
• Email addresses
• File uploads
• Query parameters
• Headers
• JSON bodies
• URL path parameters
• Search fields
• Pagination parameters
• Base64 inputs
• Redirect URLs
• Payment data
• Admin actions
Every one of these must be validated aggressively.
Secure Validation Techniques
Regex Validation
Use strict regex patterns to enforce format correctness.
Schema Validation
Use libraries like:
pydantic
marshmallow
jsonschema
These enforce strict structure.
Enum Validation
Restrict input to predefined values.
Server-Side Filtering
Reject input containing escape characters, unsafe separators, or directory traversal sequences.
Safe Type Conversion
Use safe parse functions and catch errors.
Common Mistakes to Avoid
• Relying only on frontend validation
• Trusting hidden inputs
• Using permissive regex patterns
• Allowing optional unsafe characters
• Not checking nested JSON fields
• Passing validated input into unsafe API functions
• Sanitizing instead of validating when inappropriate
• Trimming harmful characters instead of rejecting them
Reject invalid input instead of trying to “fix” it.
Extensive Practical Section
Below are full-length, hands-on practical exercises covering real-world validation and sanitization cases.
Practical 1: Build a Full Input Validation Module
Create:
secure_input/
validator.py
Add functions:
def validate_username(u):
if not re.fullmatch(r"[A-Za-z0-9_]{3,20}", u):
raise ValueError("Invalid username")
def validate_email(e):
if not re.fullmatch(r"[^@]+@[^@]+\.[^@]+", e):
raise ValueError("Invalid email")
def validate_id(i):
if not str(i).isdigit():
raise ValueError("ID must be numeric")
Test with valid and malicious inputs.
Practical 2: Validate JSON Bodies Using Schema Validation
Install jsonschema:
pip install jsonschema
Create schema:
user_schema = {
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 3, "maxLength": 50},
"age": {"type": "integer", "minimum": 13, "maximum": 100},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "email"]
}
Validate with:
validate(instance=data, schema=user_schema)
Test with invalid JSON.
Practical 3: Build a Safe SQL Query Function
Bad code:
query = f"SELECT * FROM users WHERE id = {uid}"
Secure:
cursor.execute("SELECT * FROM users WHERE id = %s", (uid,))
Test with payloads:
1 OR 1=1
'; DROP TABLE users; --
" OR "a"="a
Ensure database is safe.
Practical 4: Validate and Sanitize File Uploads
Perform checks:
-
File size
-
File extension
-
MIME type
-
Magic bytes
-
Allowed directories
Python example:
if file.mimetype not in ["image/png", "image/jpeg"]:
raise ValueError("Invalid file type")
if file.size > 5 * 1024 * 1024:
raise ValueError("File too large")
Test with:
• .php disguised as .jpg
• Malware-laced files
• Ultra large files
Practical 5: Prevent Path Traversal
Unsafe:
open("/var/www/uploads/" + filename)
Secure:
from pathlib import Path
base = Path("/var/www/uploads/")
target = base / filename
target = target.resolve()
if base not in target.parents:
raise PermissionError("Invalid path")
Test:
../../etc/passwd
../../../root/.ssh/id_rsa
uploads/../../../secrets.env
Practical 6: Validate Pagination and Numeric Inputs
Example:
def validate_page(p):
p = int(p)
if p < 1 or p > 500:
raise ValueError("Invalid page")
Test with:
• Very large numbers
• Negative numbers
• Floating points
• Non-numeric input
Practical 7: Sanitize HTML Output to Prevent XSS
Install bleach:
pip install bleach
Clean input:
safe_html = bleach.clean(user_comment)
Test with payloads:
<script>alert(1)</script>
<img src=x onerror=alert(1)>
<svg><script>...</script></svg>
Practical 8: Secure Redirect URL Validation
Do not allow open redirects:
if not redirect_url.startswith("https://trustedsite.com"):
raise ValueError("Invalid redirect")
Test redirect payloads:
https://evil.com
//evil.com
/\\evil.com
javascript:alert(1)
Practical 9: Implement Centralized Validation Middleware
Build a middleware:
def validate_request(schema):
def wrapper(func):
def inner(req):
validate(instance=req.json, schema=schema)
return func(req)
return inner
return wrapper
This enforces validation on every API endpoint.
Practical 10: Build a Safe Command Execution Wrapper
Unsafe:
os.system("ping " + ip)
Secure:
subprocess.run(["ping", "-c", "4", ip])
Test with harmful payloads:
127.0.0.1 ; ls
127.0.0.1 && whoami
127.0.0.1 | cat /etc/passwd
Practical 11: Validate Base64 Input
import base64
def validate_b64(data):
try:
base64.b64decode(data, validate=True)
except Exception:
raise ValueError("Invalid Base64 input")
Test with malformed Base64 strings.
Practical 12: Validate Headers and Cookies
Create rules to reject:
• Invalid user-agent formats
• Oversized cookies
• Suspicious header values
• Encoding anomalies
Check all incoming headers for length and structure.
Practical 13: Schema Validation for Nested JSON
Example schema:
order_schema = {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product_id": {"type": "string"},
"quantity": {"type": "integer", "minimum": 1}
},
"required": ["product_id", "quantity"]
}
}
},
"required": ["items"]
}
Test with malformed nested objects.
Practical 14: Create a Central Rejection System
Any invalid input returns:
{"error": "Invalid input"}
Log rejection reason internally.
Practical 15: Integrate Validation into CI
Add validation tests:
pytest tests/validation/
Include:
• JSON schema tests
• Injection tests
• Path traversal tests
• File upload tests
• HTML sanitization tests
Make CI fail on unsafe patterns.
Intel Dump
• Input validation rejects unsafe or malformed data
• Sanitization transforms output to prevent harmful execution
• Validation must be strict, server-side, and whitelist-based
• Types, lengths, formats, and boundaries must be enforced
• File uploads, redirects, JSON bodies, and queries need strong rules
• Practical work includes schema validation, SQL safety, file hardening, command safety, XSS prevention, numeric limits, base64 validation, nested data checks, middleware, and CI integration