Environment Variables: Security Best Practices
Last March, a single mishandled environment variable nearly destroyed our startup. A developer accidentally committed our production database credentials to a public GitHub repository, and within 6 hours, we had unauthorized access to customer data, a $47,000 AWS bill from crypto mining, and some very angry enterprise clients threatening to terminate their contracts.
That incident forced us to completely rebuild our approach to environment variables and secrets management. After 8 months of implementing zero-trust principles, automated security scans, and proper secret rotation, I can share the exact system that now protects over $2M in customer data. Here's what we learned, the mistakes that nearly killed us, and the practical implementation guide you need to avoid our fate.
The Incident That Changed Everything
March 15th, 11:47 PM. I was finishing up some late-night debugging when Slack erupted with alerts. Our monitoring system detected unusual database activity - thousands of queries per second, connections from IP addresses in Romania and China, and our Redis cache being systematically scraped.
The timeline of disaster was depressingly fast:
- 11:47 PM: Unusual database activity detected
- 11:52 PM: Crypto mining processes discovered on our compute instances
- 12:15 AM: Customer data export detected (thankfully stopped by our row-level security)
- 12:43 AM: GitHub security alert email about exposed credentials
- 1:20 AM: Full incident response team assembled
- 2:30 AM: Services taken offline for forensic analysis
The root cause? A .env
file accidentally committed to our public documentation repository. The file contained production database URLs, API keys, and service tokens that should never have existed in plain text.
But that was just the beginning. As we investigated deeper, we discovered our environment variable practices were fundamentally broken across the entire organization.
The 7 Critical Mistakes We Were Making
Mistake #1: Storing Production Secrets in .env
Files
Our original setup looked innocent enough:
# .env (DON'T DO THIS)
DATABASE_URL=postgresql://admin:super_secret_password@prod-db.amazonaws.com:5432/app
STRIPE_SECRET_KEY=sk_live_actual_secret_key_here
JWT_SECRET=this_should_be_random_but_isnt
REDIS_URL=redis://user:password@redis.company.com:6379
SENDGRID_API_KEY=SG.actual_key_here
GITHUB_TOKEN=ghp_real_github_token_here
The problems:
- Secrets stored in plain text files
- Files can be accidentally committed to version control
- No encryption at rest
- No access controls on who can read these files
- No audit trail of who accessed what secrets
Mistake #2: Using the Same Secrets Across Environments
We had a single .env.example
file that developers copied and filled in their own values:
# .env.example
DATABASE_URL=postgresql://user:password@localhost:5432/app_dev
STRIPE_SECRET_KEY=sk_test_your_test_key_here
JWT_SECRET=your-secret-key
The problems:
- Developers often used production values in development
- No distinction between environment-specific secrets
- Test keys mixed with production keys
- Easy to accidentally promote development configs to production
Mistake #3: Hardcoding Secrets in Docker Images
Our Dockerfile was a security nightmare:
# Dockerfile (TERRIBLE SECURITY)
FROM node:18-alpine
# Hardcoded secrets (NEVER DO THIS)
ENV DATABASE_URL=postgresql://admin:password@prod-db.amazonaws.com:5432/app
ENV API_KEY=actual_production_api_key
COPY . /app
WORKDIR /app
RUN npm install
CMD ["npm", "start"]
The problems:
- Secrets baked into Docker layers
- Visible in image history (
docker history
) - Can't change secrets without rebuilding images
- Anyone with image access can extract secrets
Mistake #4: Logging Environment Variables
Our logging configuration was accidentally exposing secrets:
// app.js (DANGEROUS LOGGING)
console.log('Starting app with config:', process.env);
// Error handling that logs everything
app.use((err, req, res, next) => {
console.error('Error details:', {
error: err.message,
stack: err.stack,
environment: process.env, // SECRETS LOGGED HERE
request: req.body
});
});
The problems:
- Secrets ending up in application logs
- Log aggregation services storing secrets
- Support staff can see sensitive data
- Logs often have longer retention than necessary
Mistake #5: Client-Side Environment Variables
We were exposing backend secrets to the frontend:
// next.config.js (EXPOSING SECRETS)
module.exports = {
env: {
DATABASE_URL: process.env.DATABASE_URL, // EXPOSED TO BROWSER
STRIPE_SECRET_KEY: process.env.STRIPE_SECRET_KEY, // VISIBLE IN JS
API_ENDPOINT: process.env.API_ENDPOINT
}
}
// Frontend code that made things worse
const config = {
databaseUrl: process.env.DATABASE_URL, // Available in browser dev tools
stripeKey: process.env.STRIPE_SECRET_KEY
};
The problems:
- Backend secrets exposed to browser
- Visible in client-side bundle
- Can be extracted by anyone viewing the page source
- No way to rotate these secrets without redeploying frontend
Mistake #6: No Secret Rotation Strategy
We set secrets once and forgot about them:
# Secrets that hadn't changed in 2+ years
JWT_SECRET=same_secret_since_2022
DATABASE_PASSWORD=admin123_never_changed
API_KEYS=set_once_forgotten_forever
The problems:
- Stale secrets with unknown exposure history
- No process for rotating compromised credentials
- Former employees still had access to long-lived tokens
- No automation for regular secret updates
Mistake #7: Insufficient Access Controls
Our secrets were accessible to everyone:
# Everyone on the team could see production secrets
$ heroku config -a production-app
$ aws ssm get-parameters --names "/*" --with-decryption
$ kubectl get secrets -o yaml
The problems:
- No principle of least privilege
- Junior developers had production access
- No audit trail of who accessed what secrets
- Secrets shared through insecure channels (Slack, email)
The Security-First Rebuild
After our incident, we implemented a completely new secrets management system based on zero-trust principles. Here's the exact architecture we use in production:
Layer 1: Secrets Management Service
We moved all secrets to AWS Secrets Manager with strict access controls:
# terraform/secrets.tf
resource "aws_secretsmanager_secret" "app_secrets" {
name = "app/${var.environment}/secrets"
tags = {
Environment = var.environment
Application = "main-app"
ManagedBy = "terraform"
}
}
resource "aws_secretsmanager_secret_version" "app_secrets" {
secret_id = aws_secretsmanager_secret.app_secrets.id
secret_string = jsonencode({
database_url = var.database_url
stripe_secret = var.stripe_secret
jwt_secret = random_password.jwt_secret.result
sendgrid_key = var.sendgrid_key
})
}
# Automatic rotation for database passwords
resource "aws_secretsmanager_secret_rotation" "db_rotation" {
secret_id = aws_secretsmanager_secret.db_password.id
rotation_lambda_arn = aws_lambda_function.rotate_db_password.arn
rotation_rules {
automatically_after_days = 30
}
}
Layer 2: Environment-Specific Access Policies
Each environment has its own IAM policies with minimal required permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ProductionSecretsReadOnly",
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:us-east-1:123456789:secret:app/production/*"
],
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
},
"IpAddress": {
"aws:SourceIp": [
"10.0.0.0/16",
"192.168.1.0/24"
]
}
}
}
]
}
Layer 3: Application Secret Loading
Our applications now load secrets at runtime with proper error handling:
// lib/secrets.js
const AWS = require('aws-sdk');
const secretsManager = new AWS.SecretsManager({ region: 'us-east-1' });
class SecretManager {
constructor() {
this.cache = new Map();
this.cacheTimeout = 5 * 60 * 1000; // 5 minutes
}
async getSecret(secretName) {
const cacheKey = secretName;
const cached = this.cache.get(cacheKey);
if (cached && (Date.now() - cached.timestamp) < this.cacheTimeout) {
return cached.value;
}
try {
const result = await secretsManager.getSecretValue({
SecretId: `app/${process.env.NODE_ENV}/${secretName}`
}).promise();
const secret = JSON.parse(result.SecretString);
// Cache the result
this.cache.set(cacheKey, {
value: secret,
timestamp: Date.now()
});
return secret;
} catch (error) {
console.error(`Failed to load secret ${secretName}:`, error.message);
// Fail fast - don't start the app with missing secrets
if (process.env.NODE_ENV === 'production') {
process.exit(1);
}
throw error;
}
}
async getDatabaseConfig() {
const secrets = await this.getSecret('database');
return {
host: secrets.host,
port: secrets.port,
database: secrets.database,
username: secrets.username,
password: secrets.password,
ssl: process.env.NODE_ENV === 'production'
};
}
}
module.exports = new SecretManager();
Layer 4: Development Environment Safety
For development, we use a completely separate system:
# dev-secrets.sh - Local development script
#!/bin/bash
set -e
# Check if user has development access
if ! aws sts get-caller-identity --profile dev-profile &>/dev/null; then
echo "❌ No development AWS access configured"
exit 1
fi
# Load development secrets from a separate AWS account
echo "🔐 Loading development secrets..."
export DATABASE_URL=$(aws secretsmanager get-secret-value \
--profile dev-profile \
--secret-id "app/development/database" \
--query 'SecretString' --output text | jq -r '.url')
export STRIPE_SECRET_KEY=$(aws secretsmanager get-secret-value \
--profile dev-profile \
--secret-id "app/development/stripe" \
--query 'SecretString' --output text | jq -r '.secret_key')
echo "✅ Development environment configured"
Layer 5: Container Security
Our new Docker approach keeps secrets out of images entirely:
# Dockerfile - No secrets baked in
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
# Secrets loaded at runtime, never in build
USER node
CMD ["node", "server.js"]
# docker-compose.yml - Secrets from external source
version: '3.8'
services:
app:
build: .
environment:
- NODE_ENV=production
- AWS_REGION=us-east-1
# No secret environment variables here
secrets:
- db-password
- api-keys
secrets:
db-password:
external: true
external_name: app_production_db_password
api-keys:
external: true
external_name: app_production_api_keys
Advanced Security Patterns
Pattern 1: Secret Rotation Automation
We built automatic rotation for all our secrets:
// lambda/rotate-secrets.js
const AWS = require('aws-sdk');
const secretsManager = new AWS.SecretsManager();
const rds = new AWS.RDS();
exports.handler = async (event) => {
const secretId = event.Step1.SecretId;
const token = event.Step1.ClientRequestToken;
switch (event.Step1.Step) {
case 'createSecret':
return await createNewSecret(secretId, token);
case 'setSecret':
return await setSecretInService(secretId, token);
case 'testSecret':
return await testNewSecret(secretId, token);
case 'finishSecret':
return await finishSecretRotation(secretId, token);
}
};
async function createNewSecret(secretId, token) {
// Generate new random password
const newPassword = generateSecurePassword();
// Create new version in Secrets Manager
await secretsManager.putSecretValue({
SecretId: secretId,
VersionId: token,
SecretString: JSON.stringify({
username: 'app_user',
password: newPassword,
engine: 'postgres',
host: process.env.DB_HOST,
port: 5432,
dbname: process.env.DB_NAME
})
}).promise();
return { message: 'Secret created successfully' };
}
Pattern 2: Environment Variable Validation
We validate all environment variables at startup:
// lib/config-validator.js
const Joi = require('joi');
const configSchema = Joi.object({
NODE_ENV: Joi.string().valid('development', 'staging', 'production').required(),
// Database configuration
DATABASE_HOST: Joi.string().hostname().required(),
DATABASE_PORT: Joi.number().port().default(5432),
DATABASE_NAME: Joi.string().alphanum().required(),
DATABASE_SSL: Joi.boolean().default(false),
// API Keys (must be properly formatted)
STRIPE_SECRET_KEY: Joi.string().pattern(/^sk_(test|live)_[a-zA-Z0-9]+$/).required(),
SENDGRID_API_KEY: Joi.string().pattern(/^SG\.[a-zA-Z0-9_-]+$/).required(),
// JWT Configuration
JWT_SECRET: Joi.string().min(32).required(),
JWT_EXPIRATION: Joi.string().default('24h'),
// URLs must be valid
FRONTEND_URL: Joi.string().uri().required(),
API_URL: Joi.string().uri().required()
});
function validateConfig() {
const { error, value } = configSchema.validate(process.env, {
allowUnknown: true,
stripUnknown: true
});
if (error) {
console.error('❌ Configuration validation failed:');
error.details.forEach(detail => {
console.error(` - ${detail.path.join('.')}: ${detail.message}`);
});
process.exit(1);
}
return value;
}
module.exports = { validateConfig };
Pattern 3: Secure Logging Without Secrets
We implemented smart log filtering to prevent secret leakage:
// lib/secure-logger.js
const winston = require('winston');
// Patterns that match common secret formats
const SECRET_PATTERNS = [
/sk_live_[a-zA-Z0-9]+/g, // Stripe live keys
/sk_test_[a-zA-Z0-9]+/g, // Stripe test keys
/SG\.[a-zA-Z0-9_-]+/g, // SendGrid API keys
/ghp_[a-zA-Z0-9]+/g, // GitHub personal access tokens
/xoxb-[a-zA-Z0-9-]+/g, // Slack bot tokens
/AIza[a-zA-Z0-9_-]+/g, // Google API keys
/password["\s]*:["\s]*[^"]+/gi, // Password fields in JSON
/token["\s]*:["\s]*[^"]+/gi // Token fields in JSON
];
const REDACTED_MESSAGE = '[REDACTED]';
function sanitizeForLogging(data) {
let sanitized = JSON.stringify(data, null, 2);
SECRET_PATTERNS.forEach(pattern => {
sanitized = sanitized.replace(pattern, REDACTED_MESSAGE);
});
return JSON.parse(sanitized);
}
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json(),
winston.format.printf(({ timestamp, level, message, ...meta }) => {
const sanitizedMeta = sanitizeForLogging(meta);
return JSON.stringify({ timestamp, level, message, ...sanitizedMeta });
})
),
transports: [
new winston.transports.Console(),
new winston.transports.File({ filename: 'app.log' })
]
});
module.exports = logger;
Development Workflow Security
Secure Development Setup
We created a standardized development environment setup:
#!/bin/bash
# setup-dev-env.sh
set -e
echo "🚀 Setting up secure development environment..."
# Check prerequisites
if ! command -v aws &> /dev/null; then
echo "❌ AWS CLI not installed"
exit 1
fi
if ! command -v docker &> /dev/null; then
echo "❌ Docker not installed"
exit 1
fi
# Configure AWS profile for development
echo "🔧 Configuring AWS development profile..."
aws configure set profile.dev-env.region us-east-1
aws configure set profile.dev-env.output json
# Verify access
if ! aws sts get-caller-identity --profile dev-env &>/dev/null; then
echo "❌ AWS development access not configured"
echo "Please run: aws configure sso --profile dev-env"
exit 1
fi
# Create local environment file (development only)
cat > .env.development << EOF
# Development Environment - Safe for local use
NODE_ENV=development
LOG_LEVEL=debug
# Local services
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_NAME=app_development
# Development API endpoints
API_URL=http://localhost:3001
FRONTEND_URL=http://localhost:3000
# These will be loaded from AWS Secrets Manager
# DATABASE_PASSWORD=(loaded from secrets)
# STRIPE_SECRET_KEY=(loaded from secrets)
# JWT_SECRET=(loaded from secrets)
EOF
echo "✅ Development environment configured"
echo "🔐 Secrets will be loaded from AWS Secrets Manager at runtime"
Code Review Security Checklist
We automated security checks in our pull request template:
## Security Checklist
**Environment Variables & Secrets:**
- [ ] No hardcoded secrets in code changes
- [ ] No `.env` files added to version control
- [ ] Secrets loaded from AWS Secrets Manager only
- [ ] No secrets in Docker images or containers
- [ ] No logging of sensitive data
**Configuration:**
- [ ] New environment variables added to validation schema
- [ ] Appropriate default values set for non-sensitive configs
- [ ] Development vs production configurations properly separated
**Access Control:**
- [ ] Principle of least privilege applied
- [ ] No broad AWS permissions granted
- [ ] Service-to-service auth properly implemented
Monitoring and Incident Response
Secret Usage Monitoring
We track all secret access with CloudTrail and custom metrics:
// lib/secret-monitor.js
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();
class SecretMonitor {
static async recordSecretAccess(secretName, action = 'retrieved') {
try {
await cloudwatch.putMetricData({
Namespace: 'App/Secrets',
MetricData: [
{
MetricName: 'SecretAccess',
Dimensions: [
{
Name: 'SecretName',
Value: secretName
},
{
Name: 'Action',
Value: action
},
{
Name: 'Environment',
Value: process.env.NODE_ENV
}
],
Value: 1,
Unit: 'Count',
Timestamp: new Date()
}
]
}).promise();
} catch (error) {
console.error('Failed to record secret access metric:', error);
}
}
static async checkForUnusualActivity() {
// Alert if secrets accessed from unusual locations
// Alert if too many secret retrievals in short time
// Alert if secrets accessed outside business hours
}
}
module.exports = SecretMonitor;
Automated Security Scanning
We scan for secrets in every commit:
# .github/workflows/security-scan.yml
name: Security Scan
on: [push, pull_request]
jobs:
secret-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Run TruffleHog
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: main
head: HEAD
- name: Run GitLeaks
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Custom Secret Patterns
run: |
# Check for common secret patterns
if grep -r "sk_live_" . --exclude-dir=.git; then
echo "❌ Stripe live key detected"
exit 1
fi
if grep -r "password.*=" . --exclude-dir=.git --include="*.js" --include="*.ts"; then
echo "❌ Potential hardcoded password detected"
exit 1
fi
The Results: 8 Months Later
After implementing this security-first approach, our metrics tell the story:
Security Improvements:
- Zero security incidents related to environment variables
- 100% secret rotation automated (monthly for high-risk, quarterly for others)
- Zero secrets in version control (enforced by pre-commit hooks)
- Average 12-second detection time for secret exposure attempts
Operational Benefits:
- 85% faster incident response (secrets can be rotated in minutes)
- Zero production downtime due to expired credentials
- 100% audit coverage - we know who accessed what, when
- 50% reduction in security-related support tickets
Developer Experience:
- 3-minute setup time for new developer environments
- Zero configuration drift between environments
- Automated environment validation catches config issues before deployment
- 95% developer satisfaction with the new secret management system
Cost Impact:
- $2,400/month saved on unused API keys and services (proper secret lifecycle management)
- Zero unexpected charges from compromised credentials
- 40% reduction in security tooling costs (consolidated monitoring)
Implementation Roadmap for Your Team
Based on our experience, here's the order I recommend for implementing these changes:
Week 1: Assessment and Planning
- Audit current secrets - Find every
.env
file, hardcoded secret, and configuration - Inventory access - Who can see production secrets today?
- Choose secret management solution - AWS Secrets Manager, HashiCorp Vault, or Azure Key Vault
- Set up separate AWS account for development secrets
Week 2: Development Environment
- Implement secret loading library with caching and error handling
- Create development secret management process
- Set up automated secret scanning in CI/CD pipeline
- Train team on new workflows
Week 3: Staging Migration
- Migrate staging secrets to secret management service
- Implement secret validation at application startup
- Set up monitoring and alerting for secret access
- Test secret rotation procedures
Week 4: Production Migration
- Migrate production secrets (high-risk, plan for rollback)
- Implement automatic secret rotation
- Remove old secret storage methods
- Document incident response procedures
Week 5: Hardening
- Implement advanced logging protection
- Set up comprehensive monitoring
- Create regular security review process
- Plan ongoing secret hygiene automation
Common Pitfalls to Avoid
After helping 12 other companies implement similar systems, here are the mistakes I see repeatedly:
Pitfall 1: Boiling the Ocean
Mistake: Trying to migrate everything at once Solution: Start with non-critical services and gradually work up to production databases
Pitfall 2: Ignoring Developer Experience
Mistake: Making development setup so complicated that developers find workarounds Solution: Invest in tooling that makes the secure way also the easy way
Pitfall 3: No Rollback Plan
Mistake: Migrating production secrets without a way to quickly revert Solution: Keep old secrets working in parallel until new system is proven stable
Pitfall 4: Insufficient Monitoring
Mistake: Setting up secret management but not monitoring usage Solution: Treat secret access like any other critical system metric
Pitfall 5: One-Time Setup
Mistake: Implementing secret management but no ongoing hygiene Solution: Regular audits, automated rotation, and lifecycle management
The Future of Application Security
The environment variable security incident that almost killed our company turned out to be the best thing that ever happened to our security posture. It forced us to implement proper secrets management, zero-trust principles, and defense-in-depth strategies that have prevented dozens of potential incidents since.
The key insight: security isn't a feature you add later - it's a foundation you build from day one. Environment variables and secrets management might seem like boring infrastructure work, but they're the foundation that everything else depends on.
Every startup will eventually face a security incident. The question isn't if, but when. And when that moment comes, the quality of your secrets management system will determine whether you survive with minor damage or face an existential threat to your business.
Don't wait for your own March 15th incident. Start implementing these practices today, while you still have the luxury of time and the absence of panic. Your future self, your customers, and your investors will thank you.