Skip to content

Commit 2ec468d

Browse files
committed
Update week16.do.txt
1 parent 2359c51 commit 2ec468d

1 file changed

Lines changed: 71 additions & 143 deletions

File tree

doc/src/week16/week16.do.txt

Lines changed: 71 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -6,116 +6,68 @@ DATE: May 14, 2025
66
===== Plan for the week of May 12-16 =====
77
!bblock
88
o Quantum Boltzmann Machines: Theory and Implementation
9-
* Classical Boltzmann Machines (QBMs)
9+
* Quantum neural networks, wrapping up discussions from last week (see notes from last week at URL:"https://github.com/CompPhysics/QuantumComputingMachineLearning/blob/gh-pages/doc/pub/week15/ipynb/week15.ipynb")
10+
* Classical Boltzmann Machines (BMs)
1011
* Restricted Quantum Boltzmann Machines (RQBM)
1112
* Training Quantum Boltzmann Machines
1213
* Practical Implementation with PennyLane
14+
o Summary of course and work on project 2
1315
!eblock
1416

1517
!split
1618
===== Introduction =====
1719

20+
!bblock
1821
Quantum Boltzmann Machines (QBM extend the
1922
classical Boltzmann machine (a probabilistic neural network) into the
2023
quantum domain. QBMs promise richer representations by leveraging
2124
quantum superposition and entanglement, potentially capturing
22-
correlations that classical models cannot . In these notes, we review
23-
classical Boltzmann machines and restricted Boltzmann machines (RBMs),
25+
correlations that classical models cannot .
26+
!eblock
27+
28+
!bblock
29+
In these notes, we review
30+
first classical Boltzmann machines and restricted Boltzmann machines (RBMs).
31+
Thereafter we
2432
introduce QBMs and their restricted variant (RQBM), discuss training
2533
methods, and illustrate practical implementation using
2634
PennyLane.
35+
!eblock
36+
37+
!split
38+
===== Classical Boltzmann machines =====
2739

2840

2941
!split
30-
===== Introduction to Quantum Boltzmann Machines (QBMs) =====
42+
===== Quantum Boltzmann Machines (QBMs) =====
3143

3244

3345
A Quantum Boltzmann Machine (QBM) extends a classical BM by replacing
3446
each binary unit with a qubit and generalizing the energy to a quantum
35-
Hamiltonian. Concretely, consider a system of $N$ qubits. A
36-
convenient choice is the transverse-field Ising model (TFIM)
37-
Hamiltonian:
38-
39-
40-
41-
42-
43-
Boltzmann machines are energy-based generative models that learn a
44-
probability distribution by associating an energy to each
45-
configuration of visible and hidden binary units. In a Restricted
46-
Boltzmann Machine (RBM), the units form a bipartite graph with visible
47-
and hidden layers and no intra-layer connections. The joint
48-
distribution is
49-
50-
$$
51-
52-
p(\mathbf{v},\mathbf{h}) \propto \exp\bigl[-E(\mathbf{v},\mathbf{h})\bigr],
53-
54-
$$
55-
56-
where a common choice for the energy is E(\mathbf{v},\mathbf{h}) =
57-
-\sum_i a_i v_i - \sum_j b_j h_j - \sum_{i,j} v_i\,W_{ij}\,h_j. RBMs
58-
have been widely used for unsupervised learning, dimensionality
59-
reduction, and generating samples . However, training classical BMs
60-
and RBMs is computationally expensive because computing the partition
61-
function or sampling from the model (“negative phase”) often requires
62-
many Gibbs steps .
63-
47+
Hamiltonian.
6448

6549

6650
Quantum Boltzmann machines (QBMs) generalize this framework by
6751
encoding the model distribution in a quantum Gibbs state. Instead of
68-
a classical energy, one defines a Hamiltonian H(\boldsymbol{\theta})
69-
whose parameters \boldsymbol{\theta} (biases and couplings) play the
70-
role of the RBM weights. The model’s density operator is the thermal
71-
(Gibbs) state
72-
73-
\rho(\boldsymbol{\theta}) = \frac{e^{-\beta H(\boldsymbol{\theta})}}{Z(\boldsymbol{\theta})},
74-
75-
with inverse temperature \beta (often set to 1) and partition function
76-
\(Z = \Tr(e^{-\beta H})\). The probability of observing a visible
77-
configuration v is obtained by measuring \rho in the computational
52+
a classical energy, one defines a Hamiltonian $H(\boldsymbol{\Theta})$
53+
whose parameters $\boldsymbol{\Theta}$ (biases and couplings) play the
54+
role of the RBM weights. The model’s density operator is the state
55+
!bt
56+
\[
57+
\rho(\boldsymbol{\theta}) = \frac{\exp{-(\beta H(\boldsymbol{\Theta}))}}{Z(\boldsymbol{\Theta})},
58+
\]
59+
!et
60+
with inverse temperature $\beta$ (set to 1 as we did for standard Boltzmann machines ) and partition function
61+
$Z = \Tr(\exp{-\beta H})$. The probability of observing a visible
62+
configuration v is obtained by measuring $\rho$ in the computational
7863
basis (and tracing out hidden qubits if any). In effect, the quantum
7964
model can capture richer correlations via superposition and
80-
entanglement .
81-
82-
83-
84-
Figure: A schematic of quantum generative modeling using a parameterized quantum circuit (Quantum Circuit Born Machine, or QCBM). A training dataset with empirical distribution \tilde{p}(x) is used to optimize quantum circuit parameters \theta so that the circuit-induced distribution q_\theta(x) = |\langle x|U(\theta)|0\rangle|^2 approximates the target distribution . This figure illustrates the data pipeline and loss evaluation for generative modeling. The Quantum Boltzmann machine can be viewed as another quantum generative model, where the circuit prepares a thermal state rather than a pure state.
85-
86-
65+
entanglement.
8766

88-
Figure: Framework for quantum generative modeling. A parameterized quantum circuit $U(\theta)$ is trained so that its output distribution $q_\theta(x)=|\langle x|U(\theta)|0\rangle|^2$ matches the data distribution $p(x)$ . The lower part of the figure shows the “Born machine” approach; in a Quantum Boltzmann Machine, one would instead prepare and measure a thermal (Gibbs) state of a Hamiltonian.
8967

9068

91-
92-
93-
94-
Classical Boltzmann Machines
95-
96-
97-
98-
99-
100-
A classical Boltzmann machine (BM) is an Ising model with binary units
101-
v_i,h_j. Its energy can be written E(v,h) = - \sum_i a_i v_i - \sum_j
102-
b_j h_j - \sum_{i<j} W_{ij} x_i x_j (where x runs over all units)
103-
. The restricted variant (RBM) enforces no visible-visible or
104-
hidden-hidden couplings, so only v\!-\!h interactions remain.
105-
Training maximizes the likelihood of training data by adjusting
106-
\{a,b,W\}. In practice this involves computing the “positive phase”
107-
(expectation under the data) and “negative phase” (expectation under
108-
the model), typically by Gibbs sampling or contrastive divergence .
109-
Despite the simplification of the restricted architecture, exact
110-
training of RBMs remains computationally demanding due to the cost of
111-
sampling the model distribution, motivating the exploration of quantum
112-
accelerations.
113-
114-
115-
116-
117-
118-
Quantum Boltzmann Machines
69+
!split
70+
===== Quantum Boltzmann Machines =====
11971

12072

12173

@@ -124,7 +76,7 @@ Quantum Boltzmann Machines
12476
In a Quantum Boltzmann Machine (QBM), the classical energy is replaced
12577
by a Hamiltonian H acting on qubits. The model distribution over
12678
classical bitstrings v is given by the diagonal of the quantum Gibbs
127-
state \rho = e^{-H}/Z. A straightforward choice is a stoquastic
79+
state $\rho = e^{-H}/Z$. A straightforward choice is a stoquastic
12880
Hamiltonian that is diagonal in the computational basis
12981
(e.g. involving only Pauli-$Z$ operators), which yields a probability
13082
distribution very similar to a classical BM. More generally one can
@@ -135,24 +87,23 @@ from the transverse-field Ising Hamiltonian . However,
13587
non-commutativity makes exact training harder, so many proposals use
13688
either special Hamiltonians or variational approximations.
13789

90+
!split
13891

92+
!split
93+
===== Restricted QBM (RQBM) =====
13994

14095
A Restricted Quantum Boltzmann Machine (RQBM) (also called Quantum RBM
14196
or QRBM) enforces a bipartite structure analogous to the classical
14297
RBM: no hidden-hidden interactions, and possibly limited
14398
hidden-visible connectivity. The simplest RQBM Hamiltonian can be
14499
written (up to local Pauli bases) as
145-
100+
!bt
146101
\[
147-
148-
\label{eq:rqbm_hamiltonian}
149-
150102
H(\mathbf{a},\mathbf{b},W,V) \;=\; \sum_{i=1}^{n_v} a_i Z_i \;+\; \sum_{j=1}^{n_h} b_j Z_j \;+\; \sum_{i,j} W_{ij}\, Z_i Z_j \;+\; \sum_{i<i{\prime}} V_{ii{\prime}}\, Z_i Z_{i{\prime}} \,.
151-
152103
\]
153-
154-
Here Z_i and Z_j are Pauli-$Z$ operators on the visible and hidden
155-
qubits respectively, a_i,b_j are biases, W_{ij} are visible-hidden
104+
!et
105+
Here $Z_i$ and $Z_j$ are Pauli-$Z$ operators on the visible and hidden
106+
qubits respectively, $a_i,b_j$ are biases, $W_{ij}$ are visible-hidden
156107
couplings, and V_{ii{\prime}} are possible visible-visible couplings.
157108
(Classically, V=0 in an RBM; allowing V\neq0 gives a “2-local QRBM” as
158109
in Wu et al. .) Importantly, there are no hidden-hidden $ZZ$ terms in
@@ -165,8 +116,8 @@ and proved that this 2-local QRBM is universal for quantum computation
165116

166117

167118

168-
169-
Quantum Statistical Mechanics Background
119+
!split
120+
===== Quantum Statistical Mechanics Background =====
170121

171122

172123

@@ -181,12 +132,11 @@ context, one is interested in the probability p(v) of measuring the
181132
visible qubits in computational basis state v. If the full thermal
182133
state lives on both visible and hidden qubits, this probability is
183134

135+
!bt
184136
\[
185-
186137
p_\theta(v) \;=\; \Tr\bigl[\Pi_v^{(\text{vis})}\,\rho(\theta)\bigr],
187-
188138
\]
189-
139+
!et
190140
where \(\Pi_v^{(\text{vis})}=|v\>\<v|\) acts on the visible subspace.
191141
Equivalently, one may “trace out” the hidden qubits and work with the
192142
reduced density matrix on the visible subsystem. Computing these
@@ -197,11 +147,8 @@ annealers, or variational algorithms.
197147

198148

199149

200-
201-
Energy-Based Training Objective and Gradients
202-
203-
204-
150+
!split
151+
===== Energy-Based Training Objective and Gradients =====
205152

206153

207154
RQBM training is analogous to the classical case: we have a dataset of
@@ -212,29 +159,28 @@ p_{\rm data}(v). Equivalently, one can view the data distribution as
212159
a target density matrix \eta (diagonal in the computational basis) and
213160
minimize the quantum relative entropy (quantum KL divergence)
214161

162+
!bt
215163
\[
216-
217164
S(\eta\Vert \rho(\theta)) = \Tr\!\bigl[\eta\ln\eta\bigr] - \Tr\!\bigl[\eta\ln\rho(\theta)\bigr] \;.
218-
219165
\]
166+
!et
220167

221168
This loss is non-negative and equals zero only when \eta=\rho(\theta).
222169
Writing \rho=e^{-H}/Z, one finds the gradient of the relative entropy
223170
(for parameter \theta in H) as
224-
171+
!bt
225172
\[
226-
227173
\frac{\partial}{\partial\theta} S(\eta\Vert\rho)
228-
229174
= \Tr\!\Bigl[\eta\,\partial_\theta(\beta H + \ln Z)\Bigr]
230-
231175
= \beta\Bigl(\Tr[\eta\,\partial_\theta H] - \Tr[\rho\,\partial_\theta H]\Bigr).
232-
233176
\]
234-
177+
!et
235178
In other words,
236-
179+
!bt
180+
\[
237181
\nabla_\theta S \;=\; \beta\Bigl(\langle \partial_\theta H\rangle_{\rm data} \;-\; \langle \partial_\theta H\rangle_{\rm model}\Bigr).
182+
\]
183+
!et
238184

239185
This is directly analogous to the classical RBM gradient: the update
240186
for each parameter is proportional to the difference between its
@@ -262,12 +208,8 @@ Figure: Quantum vs. classical training loop for RBMs. In the classical loop (re
262208

263209

264210

265-
266-
Parameter Optimization and Variational Techniques
267-
268-
269-
270-
211+
!split
212+
===== Parameter Optimization and Variational Techniques =====
271213

272214
Given the gradient above, one can optimize \theta by standard
273215
gradient-based methods (SGD, Adam, etc.). In a gate-based setting, we
@@ -310,10 +252,8 @@ with provably polynomial complexity under realistic conditions .
310252

311253

312254

313-
314-
Implementation with PennyLane
315-
316-
255+
!split
256+
===== Implementation with PennyLane =====
317257

318258

319259

@@ -324,7 +264,7 @@ and entangling gates that respect the bipartite structure. Below is
324264
illustrative code (in Python) using PennyLane’s default.qubit
325265
simulator.
326266

327-
267+
!bc pycod
328268
import pennylane as qml
329269
import numpy as np
330270
# Number of visible and hidden qubits
@@ -352,42 +292,30 @@ dev = qml.device(“default.qubit”, wires=n_v+n_h)
352292
# Return probability distribution on visible wires
353293

354294
return qml.probs(wires=list(range(n_v)))
355-
356-
\end{lstlisting}
295+
!ec
357296

358297

359298

360299
This circuit takes a parameter vector params of length n_v+n_h and returns the probabilities q_\theta(v) of measuring each visible bitstring v. Notice we measure only the visible wires (the wires=list(range(n_v)) in qml.probs marginalizes out the hidden qubit).
361300

362301

363302

364-
Next, we train this model to match a target dataset distribution. Suppose our data has distribution target = [p(00), p(01), p(10), p(11)]. We can define the (classical) loss as the Kullback-Leibler divergence D_{\rm KL}(p_{\rm data}\Vert q_\theta) or simply the negative log-likelihood. Then we update params by gradient descent. PennyLane’s automatic differentiation can compute gradients via the parameter-shift rule, but we show an explicit parameter-shift computation for demonstration:
365-
366-
367-
368-
\begin{lstlisting}[language=Python]
369-
370-
371-
372-
373-
374-
Example target distribution over 2 visible bits
375-
376-
377-
378-
303+
Next, we train this model to match a target dataset distribution.
304+
Suppose our data has distribution target = [p(00), p(01), p(10),
305+
p(11)]. We can define the (classical) loss as the Kullback-Leibler
306+
divergence D_{\rm KL}(p_{\rm data}\Vert q_\theta) or simply the
307+
negative log-likelihood. Then we update params by gradient descent.
308+
PennyLane’s automatic differentiation can compute gradients via the
309+
parameter-shift rule, but we show an explicit parameter-shift
310+
computation for demonstration:
379311

312+
!bc pycod
313+
# Example target distribution over 2 visible bits
380314
target = np.array([0.3, 0.2, 0.1, 0.4]) # must sum to 1
381-
382-
383-
384315
def loss(params):
385-
386-
probs = circuit(params) # model probabilities for visible states
387-
316+
probs = circuit(params) # model probabilities for visible states
388317
# Add small epsilon to avoid log(0)
389-
390-
return np.sum(target * np.log((target + 1e-9) / probs))
318+
return np.sum(target * np.log((target + 1e-9) / probs))
391319

392320

393321

0 commit comments

Comments
 (0)