Engineering

How to Build a Real-Time Collaboration Tool with WebSockets

Allan Porras

17 Dec 2025 — 9 min read

Photo by Miguel Ángel Padriñán Alba / Unsplash

The demand for real-time, Google Docs-like collaborative experiences is no longer a feature—it's an expectation. For CTOs and engineering leaders, architecting such a system presents a unique set of challenges that starkly differ from standard request-response patterns. The naive approach of HTTP polling is a non-starter, leading to high latency and unmanageable server load.

The solution lies in a persistent, bidirectional communication channel. This is the domain of the WebSocket protocol.

This article provides a technical blueprint for building a robust, real-time collaboration tool. We will move beyond a simple chat demo and focus on the core architectural components required for a production-grade system, including connection management, state synchronization, and horizontal scaling. We will primarily focus on implementing a shared text editor, as its challenges are representative of most collaborative tasks.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

Core Architecture: The WebSocket Hub and Client

At its heart, the system consists of two main parts: a central server (the "hub") that manages connections and broadcasts data, and multiple clients (browsers) that maintain a persistent WebSocket connection to that hub.

The Server-Side Hub: This is not a standard HTTP server. Its primary role is to:
- Accept and upgrade HTTP requests to WebSocket connections.
- Maintain a registry of all active connections, often mapping them to specific "documents" or "rooms."
- Receive messages (e.g., "user A typed 'hello'") from one client.
- Broadcast that message (or a derivative of it) to all other clients subscribed to the same document.
- Handle connection termination (disconnects, heartbeats).
The Client-Side Integration: The client-side application must:
- Initiate and establish a WebSocket connection (new WebSocket('wss://api.example.com')).
- Listen for local user actions (e.g., keyup events in a text editor).
- Serialize these actions into a defined message format (e.g., JSON) and send them to the server (ws.send(...)).
- Listen for messages from the server (ws.onmessage).
- Deserialize these messages and apply the received changes to the local document state, reflecting the actions of other users.

Section 1: The Server-Side Implementation (Node.js)

Let's implement the server hub. We'll use Node.js and the popular ws library for its performance and simplicity. This server will manage connections and route messages to specific "document rooms."

Key Challenge: A single server must manage many distinct collaborative sessions. We cannot simply broadcast every message to every client. We must segment connections.

Implementation: We'll use a Map to store document "rooms," where each room holds a Set of connected clients (WebSocket objects).

// server.js
const WebSocket = require('ws');
const http = require('http');
const url = require('url');

// We use a Map to store "rooms." 
// Key: documentId (e.g., 'doc-123')
// Value: Set of connected WebSocket clients
const documentRooms = new Map();

// Create a standard HTTP server to handle the initial WebSocket upgrade
const server = http.createServer((req, res) => {
    // This is where you would serve your main application
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end('WebSocket server is running.');
});

const wss = new WebSocket.Server({ noServer: true });

server.on('upgrade', (request, socket, head) => {
    // Parse the URL to get the document ID
    const { pathname } = url.parse(request.url);
    // Example URL: wss://api.example.com/documents/doc-123
    const documentId = pathname.split('/')[2]; 

    if (!documentId) {
        socket.destroy();
        return;
    }

    // Here, you MUST perform authentication/authorization
    // e.g., check a JWT token from cookies or query params
    // if (!isValidUser(request)) {
    //     socket.destroy();
    //     return;
    // }

    wss.handleUpgrade(request, socket, head, (ws) => {
        // Add this client to the correct document room
        if (!documentRooms.has(documentId)) {
            documentRooms.set(documentId, new Set());
        }
        documentRooms.get(documentId).add(ws);

        console.log(`Client connected to document: ${documentId}`);

        // Handle incoming messages from this client
        ws.on('message', (messageBuffer) => {
            // We broadcast the raw message to all *other* clients in the same room
            const clients = documentRooms.get(documentId);
            if (clients) {
                clients.forEach(client => {
                    if (client !== ws && client.readyState === WebSocket.OPEN) {
                        // Forward the message
                        client.send(messageBuffer);
                    }
                });
            }
        });

        // Handle client disconnect
        ws.on('close', () => {
            console.log(`Client disconnected from document: ${documentId}`);
            const clients = documentRooms.get(documentId);
            if (clients) {
                clients.delete(ws);
                // Clean up the room if it's empty
                if (clients.size === 0) {
                    documentRooms.delete(documentId);
                }
            }
        });

        ws.on('error', (err) => {
            console.error('WebSocket error:', err);
        });
    });
});

server.listen(8080, () => {
    console.log('WebSocket server listening on port 8080');
});

This server is a broadcast relay. It's simple, fast, and dumb. It doesn't understand the content of the messages; it just forwards them to the correct room. This is a deliberate and crucial design choice, as it delegates the complex problem of state synchronization to the clients.

Section 2: The Critical Problem: State Synchronization

If two users type at the same time, we have a conflict.

User A (state: "Hi") types "!" at the end. (Op: insert(2, "!"))
User B (state: "Hi") types "!" at the end. (Op: insert(2, "!"))

Both send their operation to the server. The server broadcasts them. User A receives B's operation and applies it. User B receives A's operation and applies it.

Result: Both users see "Hi!!". The document state has diverged and is now corrupt.

This is the central challenge of collaborative systems. The traditional solution, Operational Transformation (OT), is notoriously complex to implement correctly. It involves creating a server-side transformation function that mathematically adjusts incoming operations based on previously applied ones.

A more modern and pragmatic solution is to use Conflict-free Replicated Data Types (CRDTs).

CRDTs are data structures designed to be concurrently modified by multiple clients and then merged, with a mathematically guaranteed convergence to the same state. They are designed for this exact problem.

Yjs is the leading open-source CRDT library for building collaborative applications. We will architect our system using it.

With Yjs, our server's role remains a simple broadcast hub. The real logic moves to the client.

Each client maintains a local Yjs document (Y.Doc).
When a user types, they modify their local Y.Doc.
The Y.Doc generates a tiny binary "update message" that describes the change.
We send this binary update message over the WebSocket.
The server broadcasts this binary update message (which it doesn't understand) to all other clients in the room.
Other clients receive the binary update and apply it to their local Y.Doc (Y.applyUpdate(...)).

Because Yjs is a CRDT, the order in which updates are received does not matter. The state will always converge.

Section 3: Client-Side Implementation with Yjs

Here is how to wire up the client-side JavaScript, integrating a WebSocket with Yjs and a text editor (like the Quill editor).

// client.js
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';
import { QuillBinding } from 'y-quill';
import Quill from 'quill';
import 'quill/dist/quill.snow.css';

// 1. Get the document ID (e.g., from the URL)
const documentId = 'doc-123'; // Example

// 2. Create the Yjs document
const ydoc = new Y.Doc();

// 3. Connect to the WebSocket server using the Yjs provider
// This provider handles all the complex WebSocket logic for us.
// It connects, sends/receives updates, and handles reconnection.
const provider = new WebsocketProvider(
    'wss://api.example.com/documents/', // Base URL
    documentId,                         // Room/Document ID
    ydoc                                // The Yjs document
);

// 4. Get the shared data type for text
// 'quill' is just a name for this piece of shared data
const ytext = ydoc.getText('quill');

// 5. Initialize the Quill editor
const editorContainer = document.querySelector('#editor');
const quill = new Quill(editorContainer, {
    theme: 'snow',
    placeholder: 'Start collaborating...',
});

// 6. Bind the Yjs shared text type to the Quill editor
// This is the magic. The binding automatically syncs:
// - Local Quill changes -> to the Y.Doc
// - Remote Y.Doc changes -> to the Quill editor
const binding = new QuillBinding(ytext, quill);

// 7. Optional: Observe connection status
provider.on('status', event => {
    console.log(`WebSocket connection status: ${event.status}`);
    // You can update the UI (e.g., "Connecting...", "Connected")
});

By using the y-websocket provider, we don't even need to write the new WebSocket(...) or ws.onmessage logic ourselves. The provider handles packaging Yjs updates, sending them, and applying received updates.

Note: This requires our server from Section 1 to be compatible with the y-websocket protocol, which simply broadcasts messages to other clients in the room. Our server is compatible. We have successfully delegated all conflict resolution to the client.

Section 4: Production Architecture: Scaling and Persistence

Our single-node server from Section 1 will fail under load. It has two major limitations:

Vertical Scaling Limit: A single Node.js process can only handle a finite number of concurrent WebSocket connections (tens of thousands, typically).
Statelessness: If the server restarts, all connection state is lost. More importantly, the document state is only held in the clients' memory. A new client joining will have no document history.

Product Engineering Services

Build with 4Geeks

Scaling with a Pub/Sub Backplane

To scale horizontally, we must run multiple instances of our WebSocket server. However, if User A is on Server 1 and User B is on Server 2, they cannot communicate.

The solution is a Pub/Sub backplane, typically using Redis.

Client A sends a message to Server 1.
Server 1 receives the message. Instead of just broadcasting to its local clients, it also publishes the message to a Redis channel (e.g., doc-123).
Server 1 and Server 2 (and all other instances) are subscribed to the doc-123 Redis channel.
Both servers receive the message from Redis.
Each server then broadcasts the message to its own set of connected WebSocket clients.

This decouples the servers and allows for near-infinite horizontal scaling. The y-websocket library has a server-side component that can be configured to use Redis for this purpose.

Solving for Persistence

We still need to save the document. Yjs provides utilities for this.

Strategy: The server should be responsible for persistence.

On-Demand Loading: When the first client joins an empty room (e.g., documentRooms.get('doc-123') was just created), the server should:a. Load the latest Yjs document state from a database (e.g., PostgreSQL, S3, or a document DB).b. Instantiate a server-side Y.Doc.c. When new clients connect, the server sends them the current full document state.
Periodic/On-Change Saving: The server, now also a participant in the Yjs session (via y-websocket server), listens for document changes.a. It can save the full document state (a binary blob) to the database periodically (e.g., every 5 seconds).b. Alternatively, it can append "update messages" to a log, which is more complex but allows for point-in-time recovery.

Using a library like y-leveldb or y-indexeddb (on the server via LevelDB) can manage this persistence layer efficiently.

Conclusion

Building real-time collaboration is a significant architectural undertaking. By leveraging WebSockets for the transport layer, we gain a persistent, low-latency communication channel. However, the true challenge lies in state management.

Attempting to build Operational Transformation (OT) from scratch is a high-risk, high-cost endeavor.

A modern, pragmatic, and robust approach is to:

Use WebSockets for the transport protocol.
Implement a server-side broadcast hub that segments connections by document/room.
Delegate all state synchronization and conflict resolution to a CRDT library like Yjs on the client.
Scale the server horizontally using a Redis Pub/Sub backplane to broadcast messages across all server instances.
Implement persistence on the server by loading/saving the Yjs document state from a database on-demand or periodically.

This architecture minimizes server-side complexity, pushes intelligence to the edge, and leverages battle-tested open-source libraries to solve the most difficult problem—conflict-free state convergence.

FAQs

What are the advantages of using WebSockets for real-time collaboration over traditional HTTP?

WebSockets provide a full-duplex, persistent connection between the client and the server, which is essential for real-time collaboration. Unlike traditional HTTP, which requires a new request-response cycle for every update (polling), WebSockets allow data to flow instantly in both directions. This significantly reduces latency and server overhead, ensuring that updates—such as a teammate typing in a document or sending a message—are reflected across all connected users immediately.

How does a WebSocket-based tool handle data synchronization between multiple users?

Data synchronization is achieved through an event-based broadcasting system. When a user performs an action (like editing a line of code or drawing on a canvas), the client sends a message to the server via the WebSocket connection. The server then broadcasts this update to all other active clients. For more complex tools, developers often implement conflict resolution strategies, such as Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDTs), to ensure that concurrent edits from different users don't overwrite each other.

Which technologies are best suited for building a real-time collaborative application?

To build a robust real-time tool, a common tech stack includes Node.js for the backend due to its non-blocking I/O, and Socket.io, a library that simplifies WebSocket implementation by providing automatic reconnection and room-based communication. On the frontend, modern frameworks like React or Vue are often used to manage the dynamic UI states. Additionally, using a secure connection protocol (WSS) is critical for protecting sensitive data transmitted during collaborative sessions.

How to Build a Real-Time Collaboration Tool with WebSockets

Allan Porras

Product Engineering Services

Core Architecture: The WebSocket Hub and Client

Section 1: The Server-Side Implementation (Node.js)

Section 2: The Critical Problem: State Synchronization

Section 3: Client-Side Implementation with Yjs

Section 4: Production Architecture: Scaling and Persistence

Product Engineering Services

Scaling with a Pub/Sub Backplane

Solving for Persistence

Conclusion

FAQs

What are the advantages of using WebSockets for real-time collaboration over traditional HTTP?

How does a WebSocket-based tool handle data synchronization between multiple users?

Which technologies are best suited for building a real-time collaborative application?

Read more

Managing On-Demand Product Teams via 4Geeks Teams

Knowledge Transfer Protocols for 4Geeks Teams Engagements

Legacy System Modernization Projects with 4Geeks Teams

How 4Geeks Teams Redefines Product Development for Startups