Environment Variables: Security Best Practices

15 min read2970 words

Last March, a single mishandled environment variable nearly destroyed our startup. A developer accidentally committed our production database credentials to a public GitHub repository, and within 6 hours, we had unauthorized access to customer data, a $47,000 AWS bill from crypto mining, and some very angry enterprise clients threatening to terminate their contracts.

That incident forced us to completely rebuild our approach to environment variables and secrets management. After 8 months of implementing zero-trust principles, automated security scans, and proper secret rotation, I can share the exact system that now protects over $2M in customer data. Here's what we learned, the mistakes that nearly killed us, and the practical implementation guide you need to avoid our fate.

The Incident That Changed Everything

March 15th, 11:47 PM. I was finishing up some late-night debugging when Slack erupted with alerts. Our monitoring system detected unusual database activity - thousands of queries per second, connections from IP addresses in Romania and China, and our Redis cache being systematically scraped.

The timeline of disaster was depressingly fast:

  • 11:47 PM: Unusual database activity detected
  • 11:52 PM: Crypto mining processes discovered on our compute instances
  • 12:15 AM: Customer data export detected (thankfully stopped by our row-level security)
  • 12:43 AM: GitHub security alert email about exposed credentials
  • 1:20 AM: Full incident response team assembled
  • 2:30 AM: Services taken offline for forensic analysis

The root cause? A .env file accidentally committed to our public documentation repository. The file contained production database URLs, API keys, and service tokens that should never have existed in plain text.

But that was just the beginning. As we investigated deeper, we discovered our environment variable practices were fundamentally broken across the entire organization.

The 7 Critical Mistakes We Were Making

Mistake #1: Storing Production Secrets in .env Files

Our original setup looked innocent enough:

# .env (DON'T DO THIS)
DATABASE_URL=postgresql://admin:super_secret_password@prod-db.amazonaws.com:5432/app
STRIPE_SECRET_KEY=sk_live_actual_secret_key_here
JWT_SECRET=this_should_be_random_but_isnt
REDIS_URL=redis://user:password@redis.company.com:6379
SENDGRID_API_KEY=SG.actual_key_here
GITHUB_TOKEN=ghp_real_github_token_here

The problems:

  • Secrets stored in plain text files
  • Files can be accidentally committed to version control
  • No encryption at rest
  • No access controls on who can read these files
  • No audit trail of who accessed what secrets

Mistake #2: Using the Same Secrets Across Environments

We had a single .env.example file that developers copied and filled in their own values:

# .env.example
DATABASE_URL=postgresql://user:password@localhost:5432/app_dev
STRIPE_SECRET_KEY=sk_test_your_test_key_here
JWT_SECRET=your-secret-key

The problems:

  • Developers often used production values in development
  • No distinction between environment-specific secrets
  • Test keys mixed with production keys
  • Easy to accidentally promote development configs to production

Mistake #3: Hardcoding Secrets in Docker Images

Our Dockerfile was a security nightmare:

# Dockerfile (TERRIBLE SECURITY)
FROM node:18-alpine
 
# Hardcoded secrets (NEVER DO THIS)
ENV DATABASE_URL=postgresql://admin:password@prod-db.amazonaws.com:5432/app
ENV API_KEY=actual_production_api_key
 
COPY . /app
WORKDIR /app
RUN npm install
CMD ["npm", "start"]

The problems:

  • Secrets baked into Docker layers
  • Visible in image history (docker history)
  • Can't change secrets without rebuilding images
  • Anyone with image access can extract secrets

Mistake #4: Logging Environment Variables

Our logging configuration was accidentally exposing secrets:

// app.js (DANGEROUS LOGGING)
console.log('Starting app with config:', process.env);
 
// Error handling that logs everything
app.use((err, req, res, next) => {
  console.error('Error details:', {
    error: err.message,
    stack: err.stack,
    environment: process.env, // SECRETS LOGGED HERE
    request: req.body
  });
});

The problems:

  • Secrets ending up in application logs
  • Log aggregation services storing secrets
  • Support staff can see sensitive data
  • Logs often have longer retention than necessary

Mistake #5: Client-Side Environment Variables

We were exposing backend secrets to the frontend:

// next.config.js (EXPOSING SECRETS)
module.exports = {
  env: {
    DATABASE_URL: process.env.DATABASE_URL, // EXPOSED TO BROWSER
    STRIPE_SECRET_KEY: process.env.STRIPE_SECRET_KEY, // VISIBLE IN JS
    API_ENDPOINT: process.env.API_ENDPOINT
  }
}
 
// Frontend code that made things worse
const config = {
  databaseUrl: process.env.DATABASE_URL, // Available in browser dev tools
  stripeKey: process.env.STRIPE_SECRET_KEY
};

The problems:

  • Backend secrets exposed to browser
  • Visible in client-side bundle
  • Can be extracted by anyone viewing the page source
  • No way to rotate these secrets without redeploying frontend

Mistake #6: No Secret Rotation Strategy

We set secrets once and forgot about them:

# Secrets that hadn't changed in 2+ years
JWT_SECRET=same_secret_since_2022
DATABASE_PASSWORD=admin123_never_changed
API_KEYS=set_once_forgotten_forever

The problems:

  • Stale secrets with unknown exposure history
  • No process for rotating compromised credentials
  • Former employees still had access to long-lived tokens
  • No automation for regular secret updates

Mistake #7: Insufficient Access Controls

Our secrets were accessible to everyone:

# Everyone on the team could see production secrets
$ heroku config -a production-app
$ aws ssm get-parameters --names "/*" --with-decryption
$ kubectl get secrets -o yaml

The problems:

  • No principle of least privilege
  • Junior developers had production access
  • No audit trail of who accessed what secrets
  • Secrets shared through insecure channels (Slack, email)

The Security-First Rebuild

After our incident, we implemented a completely new secrets management system based on zero-trust principles. Here's the exact architecture we use in production:

Layer 1: Secrets Management Service

We moved all secrets to AWS Secrets Manager with strict access controls:

# terraform/secrets.tf
resource "aws_secretsmanager_secret" "app_secrets" {
  name = "app/${var.environment}/secrets"
  
  tags = {
    Environment = var.environment
    Application = "main-app"
    ManagedBy   = "terraform"
  }
}
 
resource "aws_secretsmanager_secret_version" "app_secrets" {
  secret_id = aws_secretsmanager_secret.app_secrets.id
  secret_string = jsonencode({
    database_url    = var.database_url
    stripe_secret   = var.stripe_secret
    jwt_secret      = random_password.jwt_secret.result
    sendgrid_key    = var.sendgrid_key
  })
}
 
# Automatic rotation for database passwords
resource "aws_secretsmanager_secret_rotation" "db_rotation" {
  secret_id           = aws_secretsmanager_secret.db_password.id
  rotation_lambda_arn = aws_lambda_function.rotate_db_password.arn
  
  rotation_rules {
    automatically_after_days = 30
  }
}

Layer 2: Environment-Specific Access Policies

Each environment has its own IAM policies with minimal required permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ProductionSecretsReadOnly",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": [
        "arn:aws:secretsmanager:us-east-1:123456789:secret:app/production/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "us-east-1"
        },
        "IpAddress": {
          "aws:SourceIp": [
            "10.0.0.0/16",
            "192.168.1.0/24"
          ]
        }
      }
    }
  ]
}

Layer 3: Application Secret Loading

Our applications now load secrets at runtime with proper error handling:

// lib/secrets.js
const AWS = require('aws-sdk');
const secretsManager = new AWS.SecretsManager({ region: 'us-east-1' });
 
class SecretManager {
  constructor() {
    this.cache = new Map();
    this.cacheTimeout = 5 * 60 * 1000; // 5 minutes
  }
 
  async getSecret(secretName) {
    const cacheKey = secretName;
    const cached = this.cache.get(cacheKey);
    
    if (cached && (Date.now() - cached.timestamp) < this.cacheTimeout) {
      return cached.value;
    }
 
    try {
      const result = await secretsManager.getSecretValue({
        SecretId: `app/${process.env.NODE_ENV}/${secretName}`
      }).promise();
      
      const secret = JSON.parse(result.SecretString);
      
      // Cache the result
      this.cache.set(cacheKey, {
        value: secret,
        timestamp: Date.now()
      });
      
      return secret;
    } catch (error) {
      console.error(`Failed to load secret ${secretName}:`, error.message);
      
      // Fail fast - don't start the app with missing secrets
      if (process.env.NODE_ENV === 'production') {
        process.exit(1);
      }
      
      throw error;
    }
  }
 
  async getDatabaseConfig() {
    const secrets = await this.getSecret('database');
    return {
      host: secrets.host,
      port: secrets.port,
      database: secrets.database,
      username: secrets.username,
      password: secrets.password,
      ssl: process.env.NODE_ENV === 'production'
    };
  }
}
 
module.exports = new SecretManager();

Layer 4: Development Environment Safety

For development, we use a completely separate system:

# dev-secrets.sh - Local development script
#!/bin/bash
set -e
 
# Check if user has development access
if ! aws sts get-caller-identity --profile dev-profile &>/dev/null; then
  echo "❌ No development AWS access configured"
  exit 1
fi
 
# Load development secrets from a separate AWS account
echo "🔐 Loading development secrets..."
export DATABASE_URL=$(aws secretsmanager get-secret-value \
  --profile dev-profile \
  --secret-id "app/development/database" \
  --query 'SecretString' --output text | jq -r '.url')
 
export STRIPE_SECRET_KEY=$(aws secretsmanager get-secret-value \
  --profile dev-profile \
  --secret-id "app/development/stripe" \
  --query 'SecretString' --output text | jq -r '.secret_key')
 
echo "✅ Development environment configured"

Layer 5: Container Security

Our new Docker approach keeps secrets out of images entirely:

# Dockerfile - No secrets baked in
FROM node:18-alpine
 
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
 
COPY . .
 
# Secrets loaded at runtime, never in build
USER node
CMD ["node", "server.js"]
# docker-compose.yml - Secrets from external source
version: '3.8'
services:
  app:
    build: .
    environment:
      - NODE_ENV=production
      - AWS_REGION=us-east-1
    # No secret environment variables here
    secrets:
      - db-password
      - api-keys
 
secrets:
  db-password:
    external: true
    external_name: app_production_db_password
  api-keys:
    external: true
    external_name: app_production_api_keys

Advanced Security Patterns

Pattern 1: Secret Rotation Automation

We built automatic rotation for all our secrets:

// lambda/rotate-secrets.js
const AWS = require('aws-sdk');
const secretsManager = new AWS.SecretsManager();
const rds = new AWS.RDS();
 
exports.handler = async (event) => {
  const secretId = event.Step1.SecretId;
  const token = event.Step1.ClientRequestToken;
  
  switch (event.Step1.Step) {
    case 'createSecret':
      return await createNewSecret(secretId, token);
    case 'setSecret':
      return await setSecretInService(secretId, token);
    case 'testSecret':
      return await testNewSecret(secretId, token);
    case 'finishSecret':
      return await finishSecretRotation(secretId, token);
  }
};
 
async function createNewSecret(secretId, token) {
  // Generate new random password
  const newPassword = generateSecurePassword();
  
  // Create new version in Secrets Manager
  await secretsManager.putSecretValue({
    SecretId: secretId,
    VersionId: token,
    SecretString: JSON.stringify({
      username: 'app_user',
      password: newPassword,
      engine: 'postgres',
      host: process.env.DB_HOST,
      port: 5432,
      dbname: process.env.DB_NAME
    })
  }).promise();
  
  return { message: 'Secret created successfully' };
}

Pattern 2: Environment Variable Validation

We validate all environment variables at startup:

// lib/config-validator.js
const Joi = require('joi');
 
const configSchema = Joi.object({
  NODE_ENV: Joi.string().valid('development', 'staging', 'production').required(),
  
  // Database configuration
  DATABASE_HOST: Joi.string().hostname().required(),
  DATABASE_PORT: Joi.number().port().default(5432),
  DATABASE_NAME: Joi.string().alphanum().required(),
  DATABASE_SSL: Joi.boolean().default(false),
  
  // API Keys (must be properly formatted)
  STRIPE_SECRET_KEY: Joi.string().pattern(/^sk_(test|live)_[a-zA-Z0-9]+$/).required(),
  SENDGRID_API_KEY: Joi.string().pattern(/^SG\.[a-zA-Z0-9_-]+$/).required(),
  
  // JWT Configuration
  JWT_SECRET: Joi.string().min(32).required(),
  JWT_EXPIRATION: Joi.string().default('24h'),
  
  // URLs must be valid
  FRONTEND_URL: Joi.string().uri().required(),
  API_URL: Joi.string().uri().required()
});
 
function validateConfig() {
  const { error, value } = configSchema.validate(process.env, {
    allowUnknown: true,
    stripUnknown: true
  });
  
  if (error) {
    console.error('❌ Configuration validation failed:');
    error.details.forEach(detail => {
      console.error(`  - ${detail.path.join('.')}: ${detail.message}`);
    });
    process.exit(1);
  }
  
  return value;
}
 
module.exports = { validateConfig };

Pattern 3: Secure Logging Without Secrets

We implemented smart log filtering to prevent secret leakage:

// lib/secure-logger.js
const winston = require('winston');
 
// Patterns that match common secret formats
const SECRET_PATTERNS = [
  /sk_live_[a-zA-Z0-9]+/g,        // Stripe live keys
  /sk_test_[a-zA-Z0-9]+/g,        // Stripe test keys
  /SG\.[a-zA-Z0-9_-]+/g,          // SendGrid API keys
  /ghp_[a-zA-Z0-9]+/g,            // GitHub personal access tokens
  /xoxb-[a-zA-Z0-9-]+/g,          // Slack bot tokens
  /AIza[a-zA-Z0-9_-]+/g,          // Google API keys
  /password["\s]*:["\s]*[^"]+/gi,  // Password fields in JSON
  /token["\s]*:["\s]*[^"]+/gi      // Token fields in JSON
];
 
const REDACTED_MESSAGE = '[REDACTED]';
 
function sanitizeForLogging(data) {
  let sanitized = JSON.stringify(data, null, 2);
  
  SECRET_PATTERNS.forEach(pattern => {
    sanitized = sanitized.replace(pattern, REDACTED_MESSAGE);
  });
  
  return JSON.parse(sanitized);
}
 
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json(),
    winston.format.printf(({ timestamp, level, message, ...meta }) => {
      const sanitizedMeta = sanitizeForLogging(meta);
      return JSON.stringify({ timestamp, level, message, ...sanitizedMeta });
    })
  ),
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'app.log' })
  ]
});
 
module.exports = logger;

Development Workflow Security

Secure Development Setup

We created a standardized development environment setup:

#!/bin/bash
# setup-dev-env.sh
set -e
 
echo "🚀 Setting up secure development environment..."
 
# Check prerequisites
if ! command -v aws &> /dev/null; then
    echo "❌ AWS CLI not installed"
    exit 1
fi
 
if ! command -v docker &> /dev/null; then
    echo "❌ Docker not installed"
    exit 1
fi
 
# Configure AWS profile for development
echo "🔧 Configuring AWS development profile..."
aws configure set profile.dev-env.region us-east-1
aws configure set profile.dev-env.output json
 
# Verify access
if ! aws sts get-caller-identity --profile dev-env &>/dev/null; then
    echo "❌ AWS development access not configured"
    echo "Please run: aws configure sso --profile dev-env"
    exit 1
fi
 
# Create local environment file (development only)
cat > .env.development << EOF
# Development Environment - Safe for local use
NODE_ENV=development
LOG_LEVEL=debug
 
# Local services
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_NAME=app_development
 
# Development API endpoints
API_URL=http://localhost:3001
FRONTEND_URL=http://localhost:3000
 
# These will be loaded from AWS Secrets Manager
# DATABASE_PASSWORD=(loaded from secrets)
# STRIPE_SECRET_KEY=(loaded from secrets)
# JWT_SECRET=(loaded from secrets)
EOF
 
echo "✅ Development environment configured"
echo "🔐 Secrets will be loaded from AWS Secrets Manager at runtime"

Code Review Security Checklist

We automated security checks in our pull request template:

## Security Checklist
 
**Environment Variables & Secrets:**
- [ ] No hardcoded secrets in code changes
- [ ] No `.env` files added to version control
- [ ] Secrets loaded from AWS Secrets Manager only
- [ ] No secrets in Docker images or containers
- [ ] No logging of sensitive data
 
**Configuration:**
- [ ] New environment variables added to validation schema
- [ ] Appropriate default values set for non-sensitive configs
- [ ] Development vs production configurations properly separated
 
**Access Control:**
- [ ] Principle of least privilege applied
- [ ] No broad AWS permissions granted
- [ ] Service-to-service auth properly implemented
 

Monitoring and Incident Response

Secret Usage Monitoring

We track all secret access with CloudTrail and custom metrics:

// lib/secret-monitor.js
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();
 
class SecretMonitor {
  static async recordSecretAccess(secretName, action = 'retrieved') {
    try {
      await cloudwatch.putMetricData({
        Namespace: 'App/Secrets',
        MetricData: [
          {
            MetricName: 'SecretAccess',
            Dimensions: [
              {
                Name: 'SecretName',
                Value: secretName
              },
              {
                Name: 'Action',
                Value: action
              },
              {
                Name: 'Environment',
                Value: process.env.NODE_ENV
              }
            ],
            Value: 1,
            Unit: 'Count',
            Timestamp: new Date()
          }
        ]
      }).promise();
    } catch (error) {
      console.error('Failed to record secret access metric:', error);
    }
  }
 
  static async checkForUnusualActivity() {
    // Alert if secrets accessed from unusual locations
    // Alert if too many secret retrievals in short time
    // Alert if secrets accessed outside business hours
  }
}
 
module.exports = SecretMonitor;

Automated Security Scanning

We scan for secrets in every commit:

# .github/workflows/security-scan.yml
name: Security Scan
on: [push, pull_request]
 
jobs:
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      
      - name: Run TruffleHog
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: main
          head: HEAD
          
      - name: Run GitLeaks
        uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Custom Secret Patterns
        run: |
          # Check for common secret patterns
          if grep -r "sk_live_" . --exclude-dir=.git; then
            echo "❌ Stripe live key detected"
            exit 1
          fi
          
          if grep -r "password.*=" . --exclude-dir=.git --include="*.js" --include="*.ts"; then
            echo "❌ Potential hardcoded password detected"
            exit 1
          fi

The Results: 8 Months Later

After implementing this security-first approach, our metrics tell the story:

Security Improvements:

  • Zero security incidents related to environment variables
  • 100% secret rotation automated (monthly for high-risk, quarterly for others)
  • Zero secrets in version control (enforced by pre-commit hooks)
  • Average 12-second detection time for secret exposure attempts

Operational Benefits:

  • 85% faster incident response (secrets can be rotated in minutes)
  • Zero production downtime due to expired credentials
  • 100% audit coverage - we know who accessed what, when
  • 50% reduction in security-related support tickets

Developer Experience:

  • 3-minute setup time for new developer environments
  • Zero configuration drift between environments
  • Automated environment validation catches config issues before deployment
  • 95% developer satisfaction with the new secret management system

Cost Impact:

  • $2,400/month saved on unused API keys and services (proper secret lifecycle management)
  • Zero unexpected charges from compromised credentials
  • 40% reduction in security tooling costs (consolidated monitoring)

Implementation Roadmap for Your Team

Based on our experience, here's the order I recommend for implementing these changes:

Week 1: Assessment and Planning

  1. Audit current secrets - Find every .env file, hardcoded secret, and configuration
  2. Inventory access - Who can see production secrets today?
  3. Choose secret management solution - AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault
  4. Set up separate AWS account for development secrets

Week 2: Development Environment

  1. Implement secret loading library with caching and error handling
  2. Create development secret management process
  3. Set up automated secret scanning in CI/CD pipeline
  4. Train team on new workflows

Week 3: Staging Migration

  1. Migrate staging secrets to secret management service
  2. Implement secret validation at application startup
  3. Set up monitoring and alerting for secret access
  4. Test secret rotation procedures

Week 4: Production Migration

  1. Migrate production secrets (high-risk, plan for rollback)
  2. Implement automatic secret rotation
  3. Remove old secret storage methods
  4. Document incident response procedures

Week 5: Hardening

  1. Implement advanced logging protection
  2. Set up comprehensive monitoring
  3. Create regular security review process
  4. Plan ongoing secret hygiene automation

Common Pitfalls to Avoid

After helping 12 other companies implement similar systems, here are the mistakes I see repeatedly:

Pitfall 1: Boiling the Ocean

Mistake: Trying to migrate everything at once Solution: Start with non-critical services and gradually work up to production databases

Pitfall 2: Ignoring Developer Experience

Mistake: Making development setup so complicated that developers find workarounds Solution: Invest in tooling that makes the secure way also the easy way

Pitfall 3: No Rollback Plan

Mistake: Migrating production secrets without a way to quickly revert Solution: Keep old secrets working in parallel until new system is proven stable

Pitfall 4: Insufficient Monitoring

Mistake: Setting up secret management but not monitoring usage Solution: Treat secret access like any other critical system metric

Pitfall 5: One-Time Setup

Mistake: Implementing secret management but no ongoing hygiene Solution: Regular audits, automated rotation, and lifecycle management

The Future of Application Security

The environment variable security incident that almost killed our company turned out to be the best thing that ever happened to our security posture. It forced us to implement proper secrets management, zero-trust principles, and defense-in-depth strategies that have prevented dozens of potential incidents since.

The key insight: security isn't a feature you add later - it's a foundation you build from day one. Environment variables and secrets management might seem like boring infrastructure work, but they're the foundation that everything else depends on.

Every startup will eventually face a security incident. The question isn't if, but when. And when that moment comes, the quality of your secrets management system will determine whether you survive with minor damage or face an existential threat to your business.

Don't wait for your own March 15th incident. Start implementing these practices today, while you still have the luxury of time and the absence of panic. Your future self, your customers, and your investors will thank you.