Privacy by Design

Privacy by Design#

Here are some design patterns that show how Privacy by Design works:

Minimizing Data#

Rule: Only gather what you need. Don’t keep things you don’t need.

For example:

Instead of storing plaintext emails, store hashed ones. - Before storing data, combine it (no raw events).
If you only need the country, don’t collect IP addresses. - Delete them after 90 days if you don’t need them for business. -level location
Minimize or eliminate PII relentlessly

Code example (bad):

# Storing everything "just in case"
user_event = {
    "user_id": "12345",
    "email": "user@example.com",
    "ip_address": "192.168.1.1",
    "user_agent": "Mozilla/5.0...",
    "location": {"lat": 47.6062, "lon": -122.3321},
    "timestamp": "2025-01-15T17:32:11Z",
    "page": "/products/guitar",
    "referrer": "https://google.com/search?q=guitar",
    "session_duration": 342,
    "previous_pages": ["/", "/products", "/products/guitar"]
}

Code example (better):

# Store only what you need for analytics
page_view = {
    "anonymous_id": hash(user_id),  # One-way hash, can't reverse
    "country": "US",  # No precise location
    "device_type": "mobile",  # Simplified user agent
    "page": "/products/guitar",
    "referrer_domain": "google.com",  # Drop query params
    "timestamp": "2025-01-15T17:00:00Z"  # Rounded to hour
}

You can still do analytics. You just can’t personally identify or track individuals.

Pseudonymization#

Principle: Replace identifying information with pseudonyms.

For example:

Hash email addresses before saving them - Hashing has some drawbacks when sharing data with other people (for example, they might have a bigger database and be able to reverse the hash or match it to the original email address)
Use random UUIDs instead of IDs that go in order
Separate behavioral data from PII

Architecture:

User Service (stores PII)
user_id: "uuid-abc-123" └── email: "zakk@example.com"
    Name: "Zakk Mild" Address: "123 Rock St"

Analytics Service (keeps track of behavior)
└── anonymous_id: "hash-xyz-789"   #  Can't be undone
    ├── last_visit: "2025-01-15" ─── page_views: 47
    └── favorite_category: "guitar"

Hackers can get behavioral data from the analytics database, but they can’t link it to real people.

Data Encryption#

Principle: Protect data at rest and in transit.

In practice:

HTTPS for all traffic (not just login pages)
Database encryption (PostgreSQL, MySQL and other databases support this)
Encrypted backups
Application-level encryption for sensitive fields

Example:

# Encrypting sensitive data before storage
from cryptography.fernet import Fernet

key = Fernet.generate_key()
cipher = Fernet(key)

# Store encrypted
email_encrypted = cipher.encrypt(b"zakk@example.com")
db.store(user_id, email_encrypted)

# Retrieve and decrypt only when needed
email_encrypted = db.retrieve(user_id)
email = cipher.decrypt(email_encrypted)

Access Controls#

Principle: Only certain people should be able to see user data.

In practice:

Role-based access control (RBAC) - Principle of least privilege (give the least amount of access needed)
Audit logs (keep track of who accessed what)
Protect sensitive data with more than one form of authentication

Example:

Engineers: Can look at anonymized analytics but not PII - Support team: Can see user emails and order history (with audit logging)
Team in charge of money: Can see payment information but not browsing history

Policies for Keeping Data#

Rule: Don’t keep data forever. If you don’t need it anymore, delete it.

In real life:

Set retention periods for each type of data - Set up automated deletion jobs - Follow legal hold rules (don’t delete data if it’s being investigated)

Policy example:

User account data: Kept while the account is active and deleted 90 days after it is closed. - Analytics data: Kept for 13 months, then deleted along with the raw data. - Customer support tickets: Kept for 3 years, then archived. - Financial records: Kept for 7 years (a legal requirement).

Implementation:

-- Automated cleanup job
DELETE FROM events 
WHERE timestamp < NOW() - INTERVAL '13 months';

DELETE FROM user_sessions 
WHERE last_activity < NOW() - INTERVAL '90 days';

Right to Access (Data Export)#

Principle: Users can request all data you have about them.

In practice:

Build data export tools
Package data in readable format (JSON, CSV)
Respond within legal timeframe (30 days GDPR, 45 days CCPA)

Example implementation:

def export_user_data(user_id):
    """Generate complete data export for user"""
    data = {
        "profile": get_profile(user_id),
        "purchase_history": get_purchases(user_id),
        "browsing_history": get_page_views(user_id),
        "preferences": get_settings(user_id),
        "consent_records": get_consent_history(user_id)
    }
    
    # Create downloadable file
    filename = f"user_data_{user_id}.json"
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)
    
    # Send download link to user
    send_email(user_email, download_link)

Right to Deletion (Right to Be Forgotten)#

Rule: Users can ask for their data to be deleted.

In real life:

Find all the places where data is stored (backups, databases, logs, caches) - Create workflows for deleting data
Use pseudonymization instead of hard deletion for legal or audit reasons
Tell third parties you shared data with that you deleted it

Example implementation:

def delete_user_data(user_id):
    """Delete all user data across systems"""
    
    # Delete from primary databases
    users_db.delete(user_id)
    analytics_db.delete_where(user_id=user_id)
    
    # Purge from caches
    redis.delete(f"user:{user_id}")
    
    # Anonymize in logs (can't delete logs, but remove PII)
    logs_db.update(
        {"user_id": user_id},
        {"user_id": "deleted_user", "email": "[deleted]"}
    )
    
    # Notify third parties
    notify_deletion_to_partners(user_id)
    
    # Log deletion for audit trail
    audit_log.record(f"Deleted user {user_id} per GDPR request")

Privacy Impact Assessments (PIAs)#

A Privacy Impact Assessment is a way to find and lower privacy risks before a feature goes live.

When to do a PIA:

A new product or feature that handles personal data - Changes to how data is processed - The use of new technology (AI, biometrics, etc.)
Whenever you try something new or risky

Template for PIA:

What information will be gathered? (Write down everything)
Why is it needed? (Explain each data point)
Who will be able to get in? (Teams inside the company and outside of it)
How will it be protected? (Encryption and access controls)
How long will it be kept?** (Time to keep)
What are the dangers? (Violations, abuse, and bias)
How will you lower the risks? (Specific controls)
What rights do users have? (Get in, delete, or opt-out)
What is the legal basis?** (Consent, contract, and legitimate interest)
Processors from other companies? (Vendors, sharing data)

Output: Write down the decisions, get the privacy/legal team to agree, and put the controls in place.

GDPR says that high-risk processing needs PIAs (also known as DPIAs, or Data Protection Impact Assessments). Good practice even if the law doesn’t require it.

Privacy by Design

Contents

Privacy by Design#

Minimizing Data#

Pseudonymization#

Data Encryption#

Access Controls#

Policies for Keeping Data#

Right to Access (Data Export)#

Right to Deletion (Right to Be Forgotten)#

Consent Management#

Privacy Impact Assessments (PIAs)#