Building Real-time Features with WebSockets and Next.js

After implementing real-time features in five different Next.js applications over the past year, I've learned that WebSockets with Next.js requires a specific approach. The serverless nature of Next.js on platforms like Vercel presents unique challenges, but with the right architecture, you can build performant real-time features that scale.

The Architecture That Actually Works

The biggest misconception about WebSockets in Next.js is trying to implement them within API routes. This won't work because API routes are serverless functions that terminate after each request. Instead, you need a custom Node.js server or a separate WebSocket service.

Here's the production architecture I've settled on after multiple iterations:

// server.js - Custom Node.js server
const { createServer } = require('http');
const { parse } = require('url');
const next = require('next');
const { Server } = require('socket.io');
const { createAdapter } = require('@socket.io/redis-adapter');
const Redis = require('ioredis');
 
const dev = process.env.NODE_ENV !== 'production';
const app = next({ dev });
const handle = app.getRequestHandler();
 
// Redis for scaling across multiple instances
const pubClient = new Redis(process.env.REDIS_URL);
const subClient = pubClient.duplicate();
 
app.prepare().then(() => {
  const server = createServer((req, res) => {
    const parsedUrl = parse(req.url, true);
    handle(req, res, parsedUrl);
  });
 
  const io = new Server(server, {
    cors: {
      origin: process.env.NEXT_PUBLIC_APP_URL,
      credentials: true
    },
    transports: ['websocket', 'polling']
  });
 
  // Enable Redis adapter for horizontal scaling
  io.adapter(createAdapter(pubClient, subClient));
 
  // Middleware for authentication
  io.use(async (socket, next) => {
    try {
      const token = socket.handshake.auth.token;
      const user = await verifyToken(token);
      socket.userId = user.id;
      socket.user = user;
      next();
    } catch (err) {
      next(new Error('Authentication failed'));
    }
  });
 
  // Connection handling
  io.on('connection', (socket) => {
    console.log(`User ${socket.userId} connected`);
    
    // Join user's personal room for notifications
    socket.join(`user:${socket.userId}`);
    
    // Track online status
    updateUserStatus(socket.userId, 'online');
    
    socket.on('disconnect', () => {
      updateUserStatus(socket.userId, 'offline');
    });
  });
 
  server.listen(3000, () => {
    console.log('> Server ready on http://localhost:3000');
  });
});

Client-Side Connection Management

The client implementation needs robust reconnection logic and state management:

// hooks/useSocket.ts
import { useEffect, useRef, useState } from 'react';
import { io, Socket } from 'socket.io-client';
import { useAuth } from '@/hooks/useAuth';
 
export function useSocket() {
  const [isConnected, setIsConnected] = useState(false);
  const [connectionError, setConnectionError] = useState<string | null>(null);
  const socketRef = useRef<Socket | null>(null);
  const { token } = useAuth();
  
  useEffect(() => {
    if (!token) return;
    
    const socket = io(process.env.NEXT_PUBLIC_SOCKET_URL || '', {
      auth: { token },
      reconnection: true,
      reconnectionDelay: 1000,
      reconnectionDelayMax: 5000,
      reconnectionAttempts: 5,
      transports: ['websocket', 'polling']
    });
    
    socketRef.current = socket;
    
    socket.on('connect', () => {
      setIsConnected(true);
      setConnectionError(null);
      console.log('Socket connected');
    });
    
    socket.on('disconnect', (reason) => {
      setIsConnected(false);
      if (reason === 'io server disconnect') {
        // Server disconnected us, manually reconnect
        socket.connect();
      }
    });
    
    socket.on('connect_error', (error) => {
      setConnectionError(error.message);
      console.error('Connection error:', error);
    });
    
    return () => {
      socket.disconnect();
    };
  }, [token]);
  
  return {
    socket: socketRef.current,
    isConnected,
    connectionError
  };
}

Implementing a Production Chat System

Here's a complete chat implementation that handles typing indicators, message history, and room management:

// components/Chat.tsx
import { useState, useEffect, useRef, useCallback } from 'react';
import { useSocket } from '@/hooks/useSocket';
 
interface Message {
  id: string;
  userId: string;
  username: string;
  text: string;
  timestamp: Date;
  status: 'sending' | 'sent' | 'failed';
}
 
interface TypingUser {
  userId: string;
  username: string;
}
 
export function Chat({ roomId }: { roomId: string }) {
  const [messages, setMessages] = useState<Message[]>([]);
  const [inputValue, setInputValue] = useState('');
  const [typingUsers, setTypingUsers] = useState<TypingUser[]>([]);
  const { socket, isConnected } = useSocket();
  const typingTimeoutRef = useRef<NodeJS.Timeout>();
  const messagesEndRef = useRef<HTMLDivElement>(null);
  
  useEffect(() => {
    if (!socket || !roomId) return;
    
    // Join room
    socket.emit('join-room', roomId);
    
    // Load message history
    socket.emit('load-history', roomId, (history: Message[]) => {
      setMessages(history);
      scrollToBottom();
    });
    
    // Listen for new messages
    socket.on('new-message', (message: Message) => {
      setMessages(prev => [...prev, message]);
      scrollToBottom();
    });
    
    // Listen for typing indicators
    socket.on('user-typing', ({ userId, username }: TypingUser) => {
      setTypingUsers(prev => {
        if (prev.find(u => u.userId === userId)) return prev;
        return [...prev, { userId, username }];
      });
    });
    
    socket.on('user-stop-typing', ({ userId }: { userId: string }) => {
      setTypingUsers(prev => prev.filter(u => u.userId !== userId));
    });
    
    // Message delivery confirmation
    socket.on('message-delivered', (messageId: string) => {
      setMessages(prev => 
        prev.map(msg => 
          msg.id === messageId ? { ...msg, status: 'sent' } : msg
        )
      );
    });
    
    return () => {
      socket.emit('leave-room', roomId);
      socket.off('new-message');
      socket.off('user-typing');
      socket.off('user-stop-typing');
      socket.off('message-delivered');
    };
  }, [socket, roomId]);
  
  const scrollToBottom = () => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  };
  
  const handleTyping = useCallback(() => {
    if (!socket) return;
    
    socket.emit('typing', roomId);
    
    if (typingTimeoutRef.current) {
      clearTimeout(typingTimeoutRef.current);
    }
    
    typingTimeoutRef.current = setTimeout(() => {
      socket.emit('stop-typing', roomId);
    }, 1000);
  }, [socket, roomId]);
  
  const sendMessage = useCallback(() => {
    if (!socket || !inputValue.trim()) return;
    
    const tempId = `temp-${Date.now()}`;
    const message: Message = {
      id: tempId,
      userId: socket.userId,
      username: socket.user.name,
      text: inputValue,
      timestamp: new Date(),
      status: 'sending'
    };
    
    // Optimistic update
    setMessages(prev => [...prev, message]);
    setInputValue('');
    
    // Send to server
    socket.emit('send-message', {
      roomId,
      text: inputValue,
      tempId
    }, (success: boolean, messageId?: string) => {
      if (success && messageId) {
        setMessages(prev => 
          prev.map(msg => 
            msg.id === tempId 
              ? { ...msg, id: messageId, status: 'sent' } 
              : msg
          )
        );
      } else {
        setMessages(prev => 
          prev.map(msg => 
            msg.id === tempId 
              ? { ...msg, status: 'failed' } 
              : msg
          )
        );
      }
    });
    
    socket.emit('stop-typing', roomId);
  }, [socket, inputValue, roomId]);
  
  return (
    <div className="chat-container">
      <div className="messages">
        {messages.map(msg => (
          <div key={msg.id} className={`message ${msg.status}`}>
            <strong>{msg.username}:</strong> {msg.text}
            {msg.status === 'failed' && (
              <span className="retry" onClick={() => retryMessage(msg)}>
                Retry
              </span>
            )}
          </div>
        ))}
        <div ref={messagesEndRef} />
      </div>
      
      {typingUsers.length > 0 && (
        <div className="typing-indicator">
          {typingUsers.map(u => u.username).join(', ')} 
          {typingUsers.length === 1 ? ' is' : ' are'} typing...
        </div>
      )}
      
      <div className="input-container">
        <input
          value={inputValue}
          onChange={(e) => {
            setInputValue(e.target.value);
            handleTyping();
          }}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          placeholder={isConnected ? "Type a message..." : "Connecting..."}
          disabled={!isConnected}
        />
        <button onClick={sendMessage} disabled={!isConnected}>
          Send
        </button>
      </div>
    </div>
  );
}

Server-Side Chat Implementation

The server needs to handle rooms, message persistence, and delivery confirmation:

// server/chat-handler.js
const Message = require('./models/Message');
const rateLimiter = require('./rate-limiter');
 
module.exports = (io) => {
  io.on('connection', (socket) => {
    
    socket.on('join-room', async (roomId) => {
      socket.join(roomId);
      socket.currentRoom = roomId;
      
      // Notify others in room
      socket.to(roomId).emit('user-joined', {
        userId: socket.userId,
        username: socket.user.name
      });
    });
    
    socket.on('send-message', async (data, callback) => {
      const { roomId, text, tempId } = data;
      
      // Rate limiting
      try {
        await rateLimiter.consume(socket.userId);
      } catch (rejRes) {
        callback(false);
        return;
      }
      
      // Validate and sanitize
      if (!text || text.length > 1000) {
        callback(false);
        return;
      }
      
      try {
        // Save to database
        const message = await Message.create({
          roomId,
          userId: socket.userId,
          username: socket.user.name,
          text: sanitizeHtml(text),
          timestamp: new Date()
        });
        
        // Broadcast to room
        io.to(roomId).emit('new-message', {
          id: message.id,
          userId: message.userId,
          username: message.username,
          text: message.text,
          timestamp: message.timestamp,
          status: 'sent'
        });
        
        callback(true, message.id);
      } catch (error) {
        console.error('Message save error:', error);
        callback(false);
      }
    });
    
    socket.on('typing', (roomId) => {
      socket.to(roomId).emit('user-typing', {
        userId: socket.userId,
        username: socket.user.name
      });
    });
    
    socket.on('stop-typing', (roomId) => {
      socket.to(roomId).emit('user-stop-typing', {
        userId: socket.userId
      });
    });
    
    socket.on('load-history', async (roomId, callback) => {
      try {
        const messages = await Message.find({ roomId })
          .sort({ timestamp: -1 })
          .limit(50);
        callback(messages.reverse());
      } catch (error) {
        callback([]);
      }
    });
  });
};

Live Notifications System

I've implemented a notification system that handles both in-app and queued notifications:

// hooks/useNotifications.ts
import { useEffect, useState } from 'react';
import { useSocket } from '@/hooks/useSocket';
 
interface Notification {
  id: string;
  type: 'info' | 'success' | 'warning' | 'error';
  title: string;
  message: string;
  timestamp: Date;
  read: boolean;
  action?: {
    label: string;
    url: string;
  };
}
 
export function useNotifications() {
  const [notifications, setNotifications] = useState<Notification[]>([]);
  const [unreadCount, setUnreadCount] = useState(0);
  const { socket } = useSocket();
  
  useEffect(() => {
    if (!socket) return;
    
    // Load existing notifications
    socket.emit('load-notifications', (data: Notification[]) => {
      setNotifications(data);
      setUnreadCount(data.filter(n => !n.read).length);
    });
    
    // Listen for new notifications
    socket.on('notification', (notification: Notification) => {
      setNotifications(prev => [notification, ...prev]);
      setUnreadCount(prev => prev + 1);
      
      // Show browser notification if permitted
      if (Notification.permission === 'granted') {
        new Notification(notification.title, {
          body: notification.message,
          icon: '/icon.png'
        });
      }
    });
    
    return () => {
      socket.off('notification');
    };
  }, [socket]);
  
  const markAsRead = (notificationId: string) => {
    socket?.emit('mark-notification-read', notificationId);
    setNotifications(prev => 
      prev.map(n => 
        n.id === notificationId ? { ...n, read: true } : n
      )
    );
    setUnreadCount(prev => Math.max(0, prev - 1));
  };
  
  const clearAll = () => {
    socket?.emit('clear-notifications');
    setNotifications([]);
    setUnreadCount(0);
  };
  
  return {
    notifications,
    unreadCount,
    markAsRead,
    clearAll
  };
}

Scaling WebSockets in Production

When I scaled our chat application to handle 10,000+ concurrent connections, these strategies were crucial:

// config/socket-config.js
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;
const sticky = require('sticky-session');
 
if (cluster.isMaster) {
  console.log(`Master ${process.pid} setting up ${numCPUs} workers`);
  
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
  
  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting...`);
    cluster.fork();
  });
} else {
  const server = require('./server');
  
  // Sticky sessions ensure clients reconnect to same worker
  if (!sticky.listen(server, 3000)) {
    server.once('listening', () => {
      console.log(`Worker ${process.pid} started`);
    });
  }
}

Performance Optimizations

After profiling our WebSocket server under load, these optimizations made the biggest difference:

// optimizations/message-batching.js
class MessageBatcher {
  constructor(io, interval = 100) {
    this.io = io;
    this.interval = interval;
    this.batches = new Map();
    
    setInterval(() => this.flush(), interval);
  }
  
  add(room, event, data) {
    if (!this.batches.has(room)) {
      this.batches.set(room, []);
    }
    this.batches.get(room).push({ event, data });
  }
  
  flush() {
    for (const [room, messages] of this.batches) {
      if (messages.length > 0) {
        this.io.to(room).emit('batch', messages);
        this.batches.set(room, []);
      }
    }
  }
}
 
// Usage
const batcher = new MessageBatcher(io);
 
// Instead of immediate emit
// io.to(room).emit('update', data);
 
// Batch for efficiency
batcher.add(room, 'update', data);

Security Best Practices

Security is critical for WebSocket applications. Here's my security middleware stack:

// middleware/security.js
const rateLimit = require('express-rate-limit');
const helmet = require('helmet');
const xss = require('xss');
 
// Rate limiting per socket
const socketRateLimiter = new Map();
 
io.use((socket, next) => {
  const clientId = socket.handshake.address;
  
  if (!socketRateLimiter.has(clientId)) {
    socketRateLimiter.set(clientId, {
      points: 100,
      resetTime: Date.now() + 60000
    });
  }
  
  const limiter = socketRateLimiter.get(clientId);
  
  if (Date.now() > limiter.resetTime) {
    limiter.points = 100;
    limiter.resetTime = Date.now() + 60000;
  }
  
  if (limiter.points <= 0) {
    return next(new Error('Rate limit exceeded'));
  }
  
  limiter.points--;
  next();
});
 
// Input sanitization
const sanitizeInput = (text) => {
  return xss(text, {
    whiteList: {},
    stripIgnoreTag: true,
    stripIgnoreTagBody: ['script']
  });
};
 
// Validate message size
const validateMessageSize = (message) => {
  const size = JSON.stringify(message).length;
  return size < 10000; // 10KB limit
};

Deployment Strategies

For production deployment, I use this architecture:

Next.js App: Deploy to Vercel/Netlify
WebSocket Server: Deploy to dedicated VPS or containerized service
Redis: Managed Redis instance (Redis Cloud/AWS ElastiCache)
Load Balancer: NGINX with sticky sessions

# nginx.conf
upstream websocket {
    ip_hash;  # Sticky sessions
    server ws1.example.com:3000;
    server ws2.example.com:3000;
}
 
server {
    listen 443 ssl http2;
    server_name ws.example.com;
    
    location /socket.io/ {
        proxy_pass http://websocket;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 86400;
    }
}

Building real-time features in Next.js requires careful architecture decisions, but the result is worth it. Our chat system now handles thousands of concurrent users with sub-100ms message delivery times. The key is understanding the limitations of serverless architectures and designing around them rather than fighting against them.