How to Get Started with Quantum Machine Learning

How to Get Started with Quantum Machine Learning
Photo by FlyD / Unsplash

Quantum Machine Learning (QML) exists at the bleeding edge of computer science, promising to leverage the bizarre laws of quantum mechanics to revolutionize artificial intelligence. For CTOs and engineering leaders, the narrative is often polarized: it's either a revolutionary force that will render classical ML obsolete or a distant, academic curiosity.

The reality, as is often the case, lies in the complex middle.

QML is not a drop-in replacement for TensorFlow or PyTorch. It is a fundamentally different computing paradigm. Understanding when to apply it, how it's implemented, and what its critical limitations are is essential for any technology leader planning a long-term R&D strategy.

This article provides a technical primer on the most practical and prevalent form of QML today: hybrid quantum-classical algorithms. We will bypass the pop-science analogies and focus on the architectural patterns, implementation challenges, and strategic considerations relevant to a senior engineering audience.

LLM & AI Engineering Services

We provide a comprehensive suite of AI-powered solutions, including generative AI, computer vision, machine learning, natural language processing, and AI-backed automation.

Learn more

The Core Concept: Why Bother with Quantum?

A classical bit is a 0 or a 1. A qubit (quantum bit) can exist in a state of superposition, being both 0 and 1 simultaneously, weighted by complex probability amplitudes. Furthermore, qubits can be entangled, meaning their states are inextricably linked, regardless of the distance separating them.

These two properties—superposition and entanglement—allow a quantum computer to operate in a vast computational space called a Hilbert space. The size of this space scales exponentially: $n$ qubits can represent $2^n$ classical states simultaneously.

The core hypothesis of QML is that this exponential scaling can be harnessed for ML tasks. Specifically, QML algorithms aim to:

  1. Map Data to High-Dimensional Spaces: Use quantum feature maps to project classical data into an exponentially large Hilbert space, where complex patterns may become simpler to separate (e.g., non-linearly separable data becoming linearly separable).
  2. Perform Linear Algebra Efficiently: Execute certain operations, like kernel calculations or solving linear systems, with a potential exponential speedup over classical methods.

However, we are not in the era of large, fault-tolerant quantum computers. We are in the NISQ era: Noisy Intermediate-Scale Quantum. This reality dictates our entire approach.

The Dominant Architecture: Hybrid Quantum-Classical

NISQ-era devices suffer from:

  • Noise: Quantum gates and measurements are imperfect, with high error rates.
  • Decoherence: Qubits lose their quantum state after a very short time (microseconds), limiting the depth (number of sequential operations) of any quantum circuit.
  • Low Qubit Count: Most powerful devices have 50-100s of qubits, not millions.

Because of these limitations, a purely quantum algorithm is unfeasible for any practical ML problem. The solution is the hybrid quantum-classical loop.

This architecture uses each processor for what it does best:

  • Classical CPU/GPU: Handles all data pre-processing, post-processing, and—most importantly—the optimization loop.
  • Quantum Processing Unit (QPU): Executes a short-depth, "parameterized" quantum circuit that acts as the core of the ML model.

This pattern is the foundation for the two most common QML models today: Quantum Support Vector Machines (QSVMs) and Variational Quantum Classifiers (VQCs).

Algorithm Deep Dive 1: Quantum Support Vector Machine (QSVM)

The goal of a classical SVM is to find an optimal hyperplane that separates data points into different classes. When data isn't linearly separable, it uses the "kernel trick"—a function $K(\mathbf{x}_i, \mathbf{x}_j)$ that implicitly maps the data to a higher-dimensional space where it is separable, without ever computing the mapping itself.

A QSVM simply replaces the classical kernel calculation with a quantum one.

How it Works:

  1. Feature Map: A parameterized quantum circuit, $U_\Phi(\mathbf{x})$, is designed. This circuit's job is to encode a classical data vector $\mathbf{x}$ into a quantum state $|\psi(\mathbf{x})\rangle$. This is the "quantum feature map."
  2. Kernel Calculation (on QPU): The kernel $K(\mathbf{x}_i, \mathbf{x}_j)$ is defined as the similarity between two quantum states: $K(\mathbf{x}_i, \mathbf{x}_j) = |\langle\psi(\mathbf{x}_i)|\psi(\mathbf{x}_j)\rangle|^2$. This value, the "state fidelity," can be estimated efficiently on a QPU.
  3. Classical SVM Training (on CPU): The quantum computer is used only to build the kernel matrix. This matrix, filled with similarity scores, is then fed into a standard classical SVM solver (like scikit-learn's SVC) which runs on a CPU to find the support vectors and decision boundary.

The potential "quantum advantage" comes from the feature map $U_\Phi(\mathbf{x})$. It can create a feature space so vast and complex that no classical kernel could ever efficiently compute it.

Conceptual Implementation (Python with Qiskit)

Here is a high-level example of setting up a QSVM using IBM's Qiskit.

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC

from qiskit import BasicAer
from qiskit.circuit.library import ZZFeatureMap
from qiskit_machine_learning.kernels import QuantumKernel

# 1. Load and preprocess classical data
iris = datasets.load_iris()
# We'll use only 2 features and 2 classes for simplicity
features = iris.data[iris.target < 2][:, :2]
labels = iris.target[iris.target < 2]

# Scale features to [0, 1] as quantum gates often use angles
features = MinMaxScaler().fit_transform(features)

train_features, test_features, train_labels, test_labels = train_test_split(
    features, labels, test_size=0.2, random_state=42
)

# 2. Define the Quantum Feature Map
# ZZFeatureMap is a standard choice, encoding data using 2-qubit (ZZ) interactions.
# feature_dimension = number of classical features = 2
# reps = circuit depth (how many times to repeat the pattern)
feature_map = ZZFeatureMap(feature_dimension=2, reps=2, entanglement='linear')

# 3. Instantiate the Quantum Kernel
# This object will manage running the feature map circuit on a backend
# (here, a local simulator 'qasm_simulator')
qkernel = QuantumKernel(
    feature_map=feature_map,
    quantum_instance=BasicAer.get_backend('qasm_simulator')
)

# 4. Train the Classical SVM with the Quantum Kernel
# The qkernel.evaluate() method will be called internally by SVC
# to compute the kernel matrix.
qsvm = SVC(kernel=qkernel.evaluate)
qsvm.fit(train_features, train_labels)

# 5. Evaluate the model
score = qsvm.score(test_features, test_labels)
print(f"QSVM Test Score: {score}")

# For comparison, train a classical SVM with a standard kernel
classical_svm = SVC(kernel='rbf')
classical_svm.fit(train_features, train_labels)
classical_score = classical_svm.score(test_features, test_labels)
print(f"Classical RBF SVM Test Score: {classical_score}")

Algorithm Deep Dive 2: Variational Quantum Classifier (VQC)

A VQC (also called a Quantum Neural Network or QNN) is a more direct analogue to a classical neural network. It is a prime example of the hybrid loop.

The model is a quantum circuit with tunable parameters. The training process "varies" these parameters to minimize a cost function.

How it Works:

  1. Data Encoding: A (usually non-parameterized) feature map circuit $U(\mathbf{x})$ encodes the input data $\mathbf{x}$ into a quantum state.
  2. Parameterized Circuit (Ansatz): This encoded state is then fed into a "variational" circuit $W(\boldsymbol{\theta})$. This circuit is the "model" itself, containing a set of gates whose parameters $\boldsymbol{\theta}$ are the trainable weights.
  3. Measurement (QPU): The circuit is run on the QPU. A measurement is performed on one or more qubits. This measurement yields a classical value (an "expectation value"), which serves as the model's output (e.g., a logit).
  4. Cost Calculation (CPU): The classical output is compared to the true label $\mathbf{y}$ using a classical cost function (e.g., Mean Squared Error or Cross-Entropy).
  5. Parameter Update (CPU): A classical optimizer (e.g., Adam, SPSA) calculates the gradient of the cost function with respect to the parameters $\boldsymbol{\theta}$. It then computes an update step for $\boldsymbol{\theta}$.
  6. Loop: The new parameters $\boldsymbol{\theta}'$ are sent back to the QPU for the next run (Step 2). This loop repeats until the cost converges.

Product Engineering Services

Work with our in-house Project Managers, Software Engineers and QA Testers to build your new custom software product or to support your current workflow, following Agile, DevOps and Lean methodologies.

Build with 4Geeks

Conceptual Implementation (Python with PennyLane & PyTorch)

PennyLane is a popular QML library that excels at integrating quantum circuits directly into classical ML workflows (like PyTorch or TensorFlow).

import pennylane as qml
from pennylane import numpy as np
import torch
from torch.optim import Adam

# 1. Define the quantum device (the "QPU")
# 'default.qubit' is a built-in simulator. We need 2 qubits.
n_qubits = 2
dev = qml.device('default.qubit', wires=n_qubits)

# 2. Define the Quantum Circuit as a "QNode"
# This function defines our hybrid model.
@qml.qnode(dev, interface='torch')
def qnn_circuit(inputs, weights):
    # Layer 1: Data Encoding (feature map)
    # We use angle encoding, rotating qubits by input feature values
    qml.AngleEmbedding(inputs, wires=range(n_qubits))

    # Layer 2: Parameterized Circuit (Ansatz/Weights)
    # This is our trainable "neural network" layer.
    # We use a simple structure of rotations and CNOTs.
    qml.Rot(weights[0, 0], weights[0, 1], weights[0, 2], wires=0)
    qml.Rot(weights[1, 0], weights[1, 1], weights[1, 2], wires=1)
    qml.CNOT(wires=[0, 1])

    # 3. Measurement (Output)
    # We measure the "expectation value" of the Pauli-Z operator on qubit 0.
    # This gives a classical value between -1 (for class 0) and +1 (for class 1).
    return qml.expval(qml.PauliZ(wires=0))

# 4. Define the full hybrid model
class HybridModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # We need 2 layers of 3 parameters each (for Rot gates)
        num_params = 2
        params_per_gate = 3
        # Use torch.nn.Parameter to make these weights trainable by PyTorch
        self.q_weights = torch.nn.Parameter(
            (np.random.rand(num_params, params_per_gate) * 2 * np.pi) - np.pi
        )

    def forward(self, x):
        # Pass the input data and weights to the quantum circuit
        return qnn_circuit(x, self.q_weights)

# --- 5. Classical Training Loop ---

# Create some dummy data (e.g., 4 points in 2D)
X = torch.tensor(
    [[0.1, 0.2], [0.3, 0.4], [0.9, 0.8], [0.7, 0.6]],
    dtype=torch.float32
)
Y = torch.tensor([-1.0, -1.0, 1.0, 1.0], dtype=torch.float32) # Labels: -1 and 1

model = HybridModel()
opt = Adam(model.parameters(), lr=0.1)
loss_fn = torch.nn.MSELoss()

for epoch in range(100):
    model.train()
    opt.zero_grad()

    # Forward pass: runs the QPU simulation
    predictions = model(X).squeeze()

    # Cost calculation: all classical
    loss = loss_fn(predictions, Y)

    # Backward pass & optimization: all classical
    # PennyLane auto-calculates the quantum gradients
    loss.backward()
    opt.step()

    if epoch % 20 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

# Test with new data
X_test = torch.tensor([[0.2, 0.3], [0.8, 0.7]], dtype=torch.float32)
predictions = model(X_test).squeeze()
print(f"Predictions: {predictions.data.numpy()}")

Critical Challenges & CTO-Level Considerations

While the code examples work on simulators, moving to real hardware exposes massive strategic challenges.

1. The Data Loading Bottleneck

This is arguably the biggest unsolved problem in QML. Your ML model is useless if you can't load data into it.

  • The Problem: We have a 1-petabyte classical dataset. How do we load it into a quantum state?
  • Amplitude Encoding: This method is qubit-efficient ($N$ data points fit in $\log_2 N$ qubits), but the circuit required to prepare this state generally takes $\mathcal{O}(N)$ gates. If your data loading time scales linearly with your dataset size, you have lost any potential quantum speedup before the algorithm even begins.
  • QRAM: The theoretical solution is Quantum Random Access Memory (QRAM), a device that could load classical data into superposition in $\mathcal{O}(\log N)$ time. QRAM does not exist in any practical, scalable form.

Implication: Any QML application in the NISQ era must be for problems where the dataset is small, or where a quantum advantage can be demonstrated despite an expensive classical data-loading step.

2. The Barren Plateau Problem

This is the QML equivalent of the "vanishing gradient" problem. As the number of qubits and circuit depth increase, the landscape of the cost function becomes exponentially flat.

This means the gradient is effectively zero almost everywhere. Your classical optimizer has no signal to follow, and training becomes impossible. This severely limits the complexity (depth and width) of VQC models that can be successfully trained.

3. Noise and Error Mitigation

Real hardware is noisy. A result from a real QPU is not the "true" answer; it's a "noisy" answer. A significant part of any QML workflow is Error Mitigation—running calibration circuits, post-processing results, and using techniques like Zero-Noise Extrapolation (ZNE) to estimate what the "ideal" result would have been. This adds significant computational overhead.

4. Benchmarking: Where is the Advantage?

To date, there is no robust, production-scale demonstration of "quantum advantage" for a real-world ML problem. In fact, many recent benchmarks show that well-tuned classical models (like LSTMs or Gradient-Boosted Trees) consistently outperform current QML models on standard classification and regression tasks.

Conclusion: A Strategic Outlook for CTOs

Quantum Machine Learning is one of the most exciting research fields in computing, but it must be approached with pragmatic engineering discipline.

  1. Do Not View QML as a Replacement. QML is not a better "neural network." It is a specialized tool. For 99% of your current ML problems (vision, NLP, forecasting), classical ML on GPUs will remain superior for the foreseeable future.
  2. Focus on NISQ-Native Problems. The best candidates for QML are problems that are "quantum-native." This includes simulating quantum systems for drug discovery (e.g., VQE for molecular ground states), new materials science, or solving specific optimization problems that map well to quantum circuits.
  3. Invest in Research, Not Production (Yet). The "buy-in" for QML is an R&D investment. Encourage your advanced research or data science teams to experiment with libraries like Qiskit and PennyLane. The goal is not to deploy a product today, but to build institutional knowledge for tomorrow.
  4. Track Key Metrics. The metrics to watch are not just qubit counts. Pay attention to:
    • Gate Fidelity / Error Rates: How "clean" are the operations? (e.g., 99.9% vs 99.99% is a massive difference).
    • Coherence Times: How long can a circuit "run" before decohering? This dictates maximum circuit depth.
    • Data Loading Breakthroughs (QRAM): Any progress on this front is a signal that QML is becoming more practical for classical data.

QML is a marathon, not a sprint. The leaders who succeed will be the ones who can separate the architectural reality from the hype and make patient, informed investments in building long-term capability.

Read more

How to Build a High-Performance Computing Cluster on the Cloud

How to Build a High-Performance Computing Cluster on the Cloud

For decades, High-Performance Computing (HPC) was the exclusive domain of organizations with the capital to build and maintain sprawling, power-hungry, on-premise supercomputers. The barriers to entry—massive procurement costs, long deployment cycles, and specialized facility management—kept compute-intensive workloads like genomic sequencing, computational fluid dynamics (CFD), and complex financial modeling

By Allan Porras