Input Validation & Sanitization

Input validation and sanitization ensure that all external data entering a system is safe, expected, and correctly formatted. These two practices prevent injection attacks, logic manipulation, broken authorization paths, and unexpected behavior caused by malicious or malformed input. Input validation protects your system before untrusted data interacts with code, queries, files, infrastructure, or business logic.

Why Input Validation Matters

Most attacks begin with unsafe input. Attackers manipulate parameters, headers, payloads, URLs, cookies, and file uploads to exploit flaws. Without strong validation, harmful input reaches backend logic or underlying systems. Input validation acts as the first line of defense by rejecting invalid or malicious data before any processing occurs.

Systems that lack proper validation become vulnerable to:

• SQL injection
• Command injection
• LDAP injection
• XSS
• File path traversal
• Overflow attacks
• Broken object-level authorization
• Denial of service through oversized input
• Logic bypass attacks

Input validation reduces these risks significantly.

Core Principles of Input Validation

Never Trust User Input

All external data should be considered hostile until validated. This includes input from clients, APIs, mobile devices, IoT, webhooks, and third-party integrations.

Validate at the Server Side

Client-side validation improves usability but cannot be trusted. Attackers bypass it easily. All validation must happen server-side.

Whitelist Instead of Blacklist

Define what is allowed, not what is disallowed. Whitelisting reduces chances of missing harmful patterns.

Validate Before Processing

Validate before interacting with databases, files, logic, or configurations.

Reject Early

Reject malformed input as soon as it enters the system. Do not pass it deeper into backend workflows.

Encode or Sanitize Output Separately

Validation and output encoding are different tasks. Validation checks data correctness; encoding prevents interpretation as code.

Limit Input Size

Avoid oversized or excessively long data. Limit input size to prevent DoS and memory exhaustion.

Type Check Everything

Ensure numbers are numbers, booleans are booleans, emails are emails, and objects follow expected schemas.

Validate Structurally Complex Data

Validate structured data such as JSON, XML, multipart forms, and file uploads.

Types of Input Validation

Format Validation

Check if input matches patterns such as email, username, or phone number.

re.fullmatch(r"[a-zA-Z0-9_]{3,20}", username)

Type Validation

Ensure integers, floats, and booleans are truly of the correct type.

Length Validation

Strings must fall within defined min-max limits.

Range Validation

Numeric inputs must stay within secure boundaries.

Allowed Characters Validation

Reject characters outside the safe set.

Schema Validation

Use JSON schema validators to enforce structure and types.

File Validation

Check file type, extension, MIME type, size, and content signatures.


Deep Dive on Sanitization

Sanitization prepares input for safe processing. It modifies, escapes, or transforms data into a safe form.

Output Sanitization

Used for HTML, XML, JSON, and UI output to prevent XSS or injection.

Command Sanitization

Avoid passing raw data into system commands. Use safe APIs.

File Path Sanitization

Normalize file paths and enforce safe directories.

safe_path = os.path.join(BASE_DIR, os.path.basename(user_path))

Validation + Sanitization Pipeline

A secure pipeline for all input:

  1. Accept input

  2. Check type

  3. Check length

  4. Check allowed characters

  5. Match pattern or schema

  6. Reject if invalid

  7. Sanitize if needed

  8. Encode before output

  9. Process business logic safely


High-Risk Inputs That Require Strict Validation

• Usernames
• Passwords
• Email addresses
• File uploads
• Query parameters
• Headers
• JSON bodies
• URL path parameters
• Search fields
• Pagination parameters
• Base64 inputs
• Redirect URLs
• Payment data
• Admin actions

Every one of these must be validated aggressively.


Secure Validation Techniques

Regex Validation

Use strict regex patterns to enforce format correctness.

Schema Validation

Use libraries like:

pydantic
marshmallow
jsonschema

These enforce strict structure.

Enum Validation

Restrict input to predefined values.

Server-Side Filtering

Reject input containing escape characters, unsafe separators, or directory traversal sequences.

Safe Type Conversion

Use safe parse functions and catch errors.


Common Mistakes to Avoid

• Relying only on frontend validation
• Trusting hidden inputs
• Using permissive regex patterns
• Allowing optional unsafe characters
• Not checking nested JSON fields
• Passing validated input into unsafe API functions
• Sanitizing instead of validating when inappropriate
• Trimming harmful characters instead of rejecting them

Reject invalid input instead of trying to “fix” it.


Extensive Practical Section

Below are full-length, hands-on practical exercises covering real-world validation and sanitization cases.


Practical 1: Build a Full Input Validation Module

Create:

secure_input/
  validator.py

Add functions:

def validate_username(u):
    if not re.fullmatch(r"[A-Za-z0-9_]{3,20}", u):
        raise ValueError("Invalid username")

def validate_email(e):
    if not re.fullmatch(r"[^@]+@[^@]+\.[^@]+", e):
        raise ValueError("Invalid email")

def validate_id(i):
    if not str(i).isdigit():
        raise ValueError("ID must be numeric")

Test with valid and malicious inputs.


Practical 2: Validate JSON Bodies Using Schema Validation

Install jsonschema:

pip install jsonschema

Create schema:

user_schema = {
  "type": "object",
  "properties": {
    "name": {"type": "string", "minLength": 3, "maxLength": 50},
    "age": {"type": "integer", "minimum": 13, "maximum": 100},
    "email": {"type": "string", "format": "email"}
  },
  "required": ["name", "email"]
}

Validate with:

validate(instance=data, schema=user_schema)

Test with invalid JSON.


Practical 3: Build a Safe SQL Query Function

Bad code:

query = f"SELECT * FROM users WHERE id = {uid}"

Secure:

cursor.execute("SELECT * FROM users WHERE id = %s", (uid,))

Test with payloads:

1 OR 1=1
'; DROP TABLE users; --
" OR "a"="a

Ensure database is safe.


Practical 4: Validate and Sanitize File Uploads

Perform checks:

  1. File size

  2. File extension

  3. MIME type

  4. Magic bytes

  5. Allowed directories

Python example:

if file.mimetype not in ["image/png", "image/jpeg"]:
    raise ValueError("Invalid file type")

if file.size > 5 * 1024 * 1024:
    raise ValueError("File too large")

Test with:

• .php disguised as .jpg
• Malware-laced files
• Ultra large files


Practical 5: Prevent Path Traversal

Unsafe:

open("/var/www/uploads/" + filename)

Secure:

from pathlib import Path
base = Path("/var/www/uploads/")
target = base / filename
target = target.resolve()

if base not in target.parents:
    raise PermissionError("Invalid path")

Test:

../../etc/passwd
../../../root/.ssh/id_rsa
uploads/../../../secrets.env

Practical 6: Validate Pagination and Numeric Inputs

Example:

def validate_page(p):
    p = int(p)
    if p < 1 or p > 500:
        raise ValueError("Invalid page")

Test with:

• Very large numbers
• Negative numbers
• Floating points
• Non-numeric input


Practical 7: Sanitize HTML Output to Prevent XSS

Install bleach:

pip install bleach

Clean input:

safe_html = bleach.clean(user_comment)

Test with payloads:

<script>alert(1)</script>
<img src=x onerror=alert(1)>
<svg><script>...</script></svg>

Practical 8: Secure Redirect URL Validation

Do not allow open redirects:

if not redirect_url.startswith("https://trustedsite.com"):
    raise ValueError("Invalid redirect")

Test redirect payloads:

https://evil.com
//evil.com
/\\evil.com
javascript:alert(1)

Practical 9: Implement Centralized Validation Middleware

Build a middleware:

def validate_request(schema):
    def wrapper(func):
        def inner(req):
            validate(instance=req.json, schema=schema)
            return func(req)
        return inner
    return wrapper

This enforces validation on every API endpoint.


Practical 10: Build a Safe Command Execution Wrapper

Unsafe:

os.system("ping " + ip)

Secure:

subprocess.run(["ping", "-c", "4", ip])

Test with harmful payloads:

127.0.0.1 ; ls
127.0.0.1 && whoami
127.0.0.1 | cat /etc/passwd

Practical 11: Validate Base64 Input

import base64

def validate_b64(data):
    try:
        base64.b64decode(data, validate=True)
    except Exception:
        raise ValueError("Invalid Base64 input")

Test with malformed Base64 strings.


Practical 12: Validate Headers and Cookies

Create rules to reject:

• Invalid user-agent formats
• Oversized cookies
• Suspicious header values
• Encoding anomalies

Check all incoming headers for length and structure.


Practical 13: Schema Validation for Nested JSON

Example schema:

order_schema = {
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "product_id": {"type": "string"},
          "quantity": {"type": "integer", "minimum": 1}
        },
        "required": ["product_id", "quantity"]
      }
    }
  },
  "required": ["items"]
}

Test with malformed nested objects.


Practical 14: Create a Central Rejection System

Any invalid input returns:

{"error": "Invalid input"}

Log rejection reason internally.


Practical 15: Integrate Validation into CI

Add validation tests:

pytest tests/validation/

Include:

• JSON schema tests
• Injection tests
• Path traversal tests
• File upload tests
• HTML sanitization tests

Make CI fail on unsafe patterns.


Intel Dump

• Input validation rejects unsafe or malformed data
• Sanitization transforms output to prevent harmful execution
• Validation must be strict, server-side, and whitelist-based
• Types, lengths, formats, and boundaries must be enforced
• File uploads, redirects, JSON bodies, and queries need strong rules
• Practical work includes schema validation, SQL safety, file hardening, command safety, XSS prevention, numeric limits, base64 validation, nested data checks, middleware, and CI integration

HOME LEARN COMMUNITY DASHBOARD