From cb0e27431a94910b349309c06ea4b3dc52140607 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Sun, 15 Mar 2026 06:19:17 -0400
Subject: [PATCH 01/12] Tom's March 14 edits of new lecture

---
 lectures/_static/quant-econ.bib |  26 +
 lectures/_toc.yml               |   1 +
 lectures/theil.md               | 961 ++++++++++++++++++++++++++++++++
 3 files changed, 988 insertions(+)
 create mode 100644 lectures/theil.md
diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 55eb1e7a4..bbee7e6be 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -3,6 +3,14 @@
 Note: Extended Information (like abstracts, doi, url's etc.) can be found in quant-econ-extendedinfo.bib file in _static/
 ###
 
+@inproceedings{hansen2004certainty,
+  title={Certainty equivalence and model uncertainty},
+  author={Hansen, Lars Peter and Sargent, Thomas J},
+  booktitle={Conference on Models and Monetary Policy: Research in the Tradition of Dale Henderson, Richard Porter, and Peter Tinsley (http://www. federalreserve. gov/events/conferences/mmp2004/pdf/hansensargent. pdf)},
+  year={2004}
+}
+
+
 @article{evans2005interview,
   title={An interview with Thomas J. Sargent},
   author={Evans, George W and Honkapohja, Seppo},
@@ -570,6 +578,24 @@ @article{HST_1999
 }
 
 
+@article{simon1956dynamic,
+  title={Dynamic programming under uncertainty with a quadratic criterion function},
+  author={Simon, Herbert A},
+  journal={Econometrica, Journal of the Econometric Society},
+  pages={74--81},
+  year={1956},
+  publisher={JSTOR}
+}
+
+@article{theil1957note,
+  title={A note on certainty equivalence in dynamic planning},
+  author={Theil, Henri},
+  journal={Econometrica: Journal of the Econometric Society},
+  pages={346--349},
+  year={1957},
+  publisher={JSTOR}
+}
+
 @article{Jacobson_73,
   author  = {D. H. Jacobson},
   year    = {1973},
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index f43d7f3dc..0c102d1c9 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -104,6 +104,7 @@ parts:
   - file: cross_product_trick
   - file: perm_income
   - file: perm_income_cons
+  - file: theil 
   - file: lq_inventories
 - caption: Optimal Growth
   numbered: true
diff --git a/lectures/theil.md b/lectures/theil.md
new file mode 100644
index 000000000..00347c7d4
--- /dev/null
+++ b/lectures/theil.md
@@ -0,0 +1,961 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+(certainty_equiv_robustness)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Certainty Equivalence and Model Uncertainty
+
+```{index} single: Certainty Equivalence; Robustness
+```
+
+```{index} single: LQ Control; Permanent Income
+```
+
+```{contents} Contents
+:depth: 2
+```
+
+This lecture draws on {cite}`hansen2004certainty` and  {cite}`HansenSargent2008`.
+
+In addition to what's in Anaconda, this lecture will need the following libraries:
+
+```{code-cell} ipython3
+---
+tags: [hide-output]
+---
+!pip install quantecon
+```
+
+## Overview
+
+
+
+Simon {cite}`simon1956dynamic` and Theil {cite}`theil1957note` established a celebrated
+*certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
+problems.  Their result justifies a convenient two-step algorithm:
+
+1. **Optimize** under perfect foresight (treat future exogenous variables as known).
+2. **Forecast** — substitute optimal forecasts for the unknown future values.
+
+The striking insight is that these two steps are completely separable.  The decision
+rule that emerges from step 1 is *identical* to the decision rule for the original
+stochastic problem once optimal forecasts are substituted in step 2.  In particular,
+the decision rule does not depend on the variance of the shocks — only the *level* of
+the optimal value function does.
+
+This lecture extends the classical result in two directions motivated by
+{cite}`hansen2004certainty`:
+
+- **Model uncertainty and robustness.** What happens when the decision maker does not
+  trust his model?  A remarkable version of CE survives, but now the "forecasting" step
+  uses a *distorted* probability distribution that the decision maker deliberately tilts
+  against himself in order to achieve robustness.
+
+- **Risk-sensitive preferences.** A mathematically equivalent reformulation interprets
+  the same decision rules through Epstein–Zin recursive preferences.  The robustness
+  parameter $\theta$ and the risk-sensitivity parameter $\sigma$ are linked by
+  $\theta = -\sigma^{-1}$.
+
+We illustrate all three settings — ordinary CE, robust CE, and the permanent income
+application — with Python code using `quantecon`.
+
+### Model Features
+
+* Linear transition laws and quadratic objectives (LQ framework).
+* Ordinary CE: optimal policy independent of noise variance.
+* Robust CE: distorted forecasts replace rational forecasts; policy changes with $\theta$.
+* Permanent income application: Hall's martingale, precautionary savings under robustness,
+  and observational equivalence between robustness and patience.
+
+We begin with imports:
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.linalg import solve
+from quantecon import LQ, RBLQ
+```
+
+---
+
+## Ordinary Certainty Equivalence
+
+### Notation and Setup
+
+Let $y_t$ denote the state vector, partitioned as
+
+$$
+y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix}
+$$
+
+where $z_t$ is an *exogenous* component with transition law
+
+```{math}
+:label: t1
+z_{t+1} = f(z_t,\, \epsilon_{t+1})
+```
+
+and $\epsilon_{t+1}$ is an i.i.d. sequence with c.d.f. $\Phi$.
+The *endogenous* component $x_t$ obeys
+
+```{math}
+:label: t2
+x_{t+1} = g(x_t,\, z_t,\, u_t)
+```
+
+where $u_t$ is the decision maker's control.
+
+The decision maker maximises the discounted expected return
+
+```{math}
+:label: t3
+\mathbb{E}\!\left[\sum_{t=0}^{\infty} \beta^t\, r(y_t, u_t)\,\Big|\, y^0\right],
+\qquad \beta \in (0,1)
+```
+
+choosing a control $u_t$ measurable with respect to the history $y^t \equiv
+(x^t, z^t)$.  The solution is a stationary decision rule
+
+$$
+u_t = h(x_t, z_t).
+$$
+
+Throughout, we maintain the following assumption from Simon and Theil:
+
+> **Assumption 1.**  The return function $r(y,u) = -y'Qy - u'Ru$ is quadratic
+> ($Q, R \succeq 0$); $f$ and $g$ are both linear; and $\Phi$ is multivariate
+> Gaussian with mean zero.
+
+### The Two-Step Algorithm
+
+Under Assumption 1, the stochastic optimisation problem separates into two independent
+steps.
+
+**Step 1 — Perfect-foresight control.**  Solve the *nonstochastic* problem of
+maximising {eq}`t3` subject to {eq}`t2`, treating the future sequence
+$\mathbf{z}_t = (z_t, z_{t+1}, \ldots)$ as known.  The solution is the
+*feedback-feedforward* rule
+
+```{math}
+:label: t4
+u_t = h_1(x_t,\, \mathbf{z}_t).
+```
+
+The function $h_1$ depends only on $r$ and $g$ (i.e., only on $Q$, $R$, and the
+matrices of the $x$-transition law).  It does **not** require knowledge of the
+noise process $f$ or $\Phi$.  Under Assumption 1, $h_1$ is a linear function.
+
+**Step 2 — Optimal forecasting.**  Using $f$ and $\Phi$ in {eq}`t1`,
+iterate the linear law of motion forward:
+
+$$
+\mathbf{z}_t = h_2 \cdot z_t\; +\; h_3 \cdot \epsilon_{t+1}^{\infty}.
+$$
+
+Since the shocks are i.i.d. with mean zero,
+
+```{math}
+:label: t5
+\mathbb{E}[\mathbf{z}_t \mid z^t] = h_2 \cdot z_t.
+```
+
+**The CE principle.**  Substitute {eq}`t5` for $\mathbf{z}_t$ in {eq}`t4`:
+
+```{math}
+:label: t6
+u_t = h_1(x_t,\; h_2 \cdot z_t) \;=\; h(x_t,\, z_t).
+```
+
+Each of $h_1$, $h_2$, and $h$ is a linear function.  The original stochastic
+problem thus *separates* into a nonstochastic control problem and a statistical
+filtering problem.
+
+### Value Function and Volatility
+
+The optimal value function takes the quadratic form
+
+```{math}
+:label: t9
+V(y_0) = -y_0' P\, y_0 - p.
+```
+
+Two key observations follow from the separation:
+
+- The matrix $P$ is the fixed point of an operator $T(P; r, g, f_1)$ that involves
+  only the *persistence* matrix $f_1$ (from $z_{t+1} = f_1 z_t + f_2 \epsilon_{t+1}$),
+  **not** the volatility matrix $f_2$.  Therefore **$P$ does not depend on the noise
+  loadings**, and neither does the decision rule $h$.
+
+- The scalar constant $p$ equals $\beta/(1-\beta)\,\mathrm{tr}(f_2' P f_2)$, so
+  **$p$ grows with volatility**.
+
+An equivalent statement: the same decision rule $h$ emerges from the *nonstochastic*
+version of the problem obtained by setting all shocks to zero,
+$z_{t+1} = f_1 z_t$.  The presence of uncertainty *lowers the value* (larger $p$)
+but does not alter *behaviour*.
+
+### Python: Demonstrating Certainty Equivalence
+
+The following code verifies the CE principle numerically.  We consider a simple
+scalar LQ problem and vary the noise standard deviation $\sigma$.
+
+```{code-cell} ipython3
+# ── Simple 1-D scalar LQ problem ───────────────────────────────────────────
+# y_{t+1} = a·y_t + b·u_t + σ·ε_{t+1},   r = −(q·y² + r·u²)
+
+a, b_coeff = 0.9, 1.0
+q_state, r_ctrl = 1.0, 1.0
+beta = 0.95
+
+A = np.array([[a]])
+B = np.array([[b_coeff]])
+Q_mat = np.array([[q_state]])
+R_mat = np.array([[r_ctrl]])
+
+sigma_vals = np.linspace(0.0, 3.0, 80)
+F_vals, d_vals = [], []
+
+for sigma in sigma_vals:
+    C = np.array([[sigma]])
+    lq = LQ(Q_mat, R_mat, A, B, C=C, beta=beta)
+    P, F, d = lq.stationary_values()
+    F_vals.append(float(F[0, 0]))
+    d_vals.append(float(d))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+axes[0].plot(sigma_vals, F_vals, lw=2)
+axes[0].set_xlabel('Noise level $\\sigma$')
+axes[0].set_ylabel('Policy gain $F$')
+axes[0].set_title('CE: Policy does not depend on noise')
+axes[0].set_ylim(0, 2 * max(F_vals) + 0.1)
+
+axes[1].plot(sigma_vals, d_vals, lw=2, color='darkorange')
+axes[1].set_xlabel('Noise level $\\sigma$')
+axes[1].set_ylabel('Value constant $d$')
+axes[1].set_title('Noise lowers value but not the decision rule')
+
+plt.tight_layout()
+plt.show()
+```
+
+As the plot confirms, $F$ (the policy gain) is **flat** across all noise levels,
+while the value constant $d$ increases monotonically with $\sigma$.  This is the
+CE principle in action.
+
+---
+
+## Model Uncertainty and Robustness
+
+### Setup and the Multiplier Problem
+
+The decision maker in Simon and Theil's setting knows his model exactly — he has
+no doubt about the transition law {eq}`t1`.  Now suppose he suspects that the true
+data-generating process is
+
+```{math}
+:label: t30
+z_{t+1} = f(z_t,\; \epsilon_{t+1} + w_{t+1})
+```
+
+where $w_{t+1} = \omega_t(x^t, z^t)$ is a misspecification term chosen by an
+adversarial "nature."  The decision maker believes his approximating model is a
+good approximation in the sense that
+
+$$
+\hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t\, w_{t+1}' w_{t+1}
+      \,\Big|\, y_0\right] \leq \eta_0,
+$$
+
+where $\eta_0$ parametrises the tolerated misspecification budget and $\hat{\mathbb{E}}$
+is the expectation under the distorted law {eq}`t30`.
+
+To construct a *robust* decision rule the decision maker solves the
+**multiplier problem** — a two-player zero-sum dynamic game:
+
+```{math}
+:label: t32
+\min_{\{w_{t+1}\}}\, \max_{\{u_t\}}\;
+\hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t
+    \Bigl\{r(y_t, u_t) + \theta\beta\, w_{t+1}' w_{t+1}\Bigr\}\,
+    \Big|\, y_0\right]
+```
+
+where $\theta > 0$ penalises large distortions.  A larger $\theta$ shrinks the
+feasible misspecification set; as $\theta \to \infty$ the problem reduces to
+ordinary LQ.
+
+The Markov perfect equilibrium of {eq}`t32` delivers a *robust* rule
+$u_t = h(x_t, z_t)$ together with a worst-case distortion process
+$w_{t+1} = W(x_t, z_t)$.
+
+### Stackelberg Timing and the Modified CE
+
+The Markov perfect equilibrium *conceals* a form of CE.  To reveal it, Hansen and
+Sargent {cite}`HansenSargent2001` impose a **Stackelberg timing protocol**: at
+time 0, the *minimising* player commits once and for all to a plan
+$\{w_{t+1}\}$, after which the *maximising* player chooses $u_t$ sequentially.
+This makes the minimiser the Stackelberg leader.
+
+To describe the leader's committed plan, introduce "big-letter" state variables
+$(X_t, Z_t)$ (same dimensions as $(x_t, z_t)$) that encode the leader's
+pre-committed strategy:
+
+$$
+\begin{aligned}
+w_{t+1} &= W(X_t, Z_t), \\
+X_{t+1} &= g(X_t, Z_t,\, h(X_t, Z_t)), \\
+Z_{t+1} &= f(Z_t,\, W(X_t, Z_t) + \epsilon_{t+1}).
+\end{aligned}
+$$
+
+Summarised with $Y_t = \begin{bmatrix} X_t \\ Z_t \end{bmatrix}$:
+
+```{math}
+:label: t34
+Y_{t+1} = M Y_t + N \epsilon_{t+1}, \qquad w_{t+1} = W(Y_t).
+```
+
+The maximising player then faces an *ordinary* dynamic programming problem subject
+to his own dynamics {eq}`t2`, the distorted $z$-law {eq}`t30`, and the exogenous
+process {eq}`t34`.  His optimal rule takes the form
+
+$$
+u_t = \tilde{H}(x_t, z_t, Y_t).
+$$
+
+Başar and Bernhard (1995) and Hansen and Sargent (2004) establish that at
+equilibrium (with "big $K$ = little $k$" imposed) this collapses to
+
+$$
+\tilde{H}(X_t, Z_t, Y_t) = h(Y_t),
+$$
+
+the *same* rule as the Markov perfect equilibrium of {eq}`t32`.
+
+### Modified Separation Principle
+
+The Stackelberg timing permits an Euler-equation approach.  The two-step algorithm
+becomes:
+
+**Step 1** (unchanged).  Solve the same nonstochastic control problem as before:
+$u_t = h_1(x_t, \mathbf{z}_t)$.
+
+**Step 2** (modified).  Form forecasts using the *distorted* law of motion
+{eq}`t34`.  By the linearity and Gaussianity of the system,
+
+```{math}
+:label: t37
+\hat{\mathbb{E}}[\mathbf{z}_t \mid z^t, Y^t]
+    = \tilde{h}_2 \begin{bmatrix} z_t \\ Y_t \end{bmatrix}
+```
+
+where $\hat{\mathbb{E}}$ uses the distorted model.
+
+Substituting {eq}`t37` into $h_1$ and imposing $Y_t = y_t$ gives the robust rule
+
+```{math}
+:label: t38
+u_t = h_1\!\left(x_t,\; \hat{h}_2 \cdot y_t\right) = h(x_t, z_t).
+```
+
+This is the modified CE: **step 1 is identical to the non-robust case**; only
+step 2 changes, using distorted rather than rational forecasts.
+
+### Python: How Robustness Changes the Policy
+
+In contrast to ordinary CE, the robust policy **does** change as $\theta$ varies.
+As $\theta \to \infty$ (no robustness) the robust policy converges to the standard LQ
+policy.
+
+```{code-cell} ipython3
+# ── Robust LQ: same 1-D problem, varying θ ──────────────────────────────────
+sigma_fixed = 1.0
+C_fixed = np.array([[sigma_fixed]])
+
+# Standard (non-robust) benchmark
+lq_std = LQ(Q_mat, R_mat, A, B, C=C_fixed, beta=beta)
+P_std, F_std_arr, d_std = lq_std.stationary_values()
+F_standard = float(F_std_arr[0, 0])
+P_standard = float(P_std[0, 0])
+
+theta_vals = np.linspace(2.0, 30.0, 120)   # theta must exceed 1/(2P) ≈ 0.4; use ≥ 2
+F_rob_vals, P_rob_vals = [], []
+
+for theta in theta_vals:
+    rblq = RBLQ(Q_mat, R_mat, A, B, C_fixed, beta, theta)
+    F_rob, K_rob, P_rob = rblq.robust_rule()
+    F_rob_vals.append(float(F_rob[0, 0]))
+    P_rob_vals.append(float(P_rob[0, 0]))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+axes[0].plot(theta_vals, F_rob_vals, lw=2, label='Robust $F(\\theta)$')
+axes[0].axhline(F_standard, color='r', linestyle='--', lw=1.5,
+                label=f'Standard LQ ($F = {F_standard:.3f}$)')
+axes[0].set_xlabel('Robustness parameter $\\theta$')
+axes[0].set_ylabel('Policy gain $F$')
+axes[0].set_title('Robustness changes the policy')
+axes[0].legend()
+
+axes[1].plot(theta_vals, P_rob_vals, lw=2, color='purple',
+             label='Robust $P(\\theta)$')
+axes[1].axhline(P_standard, color='r', linestyle='--', lw=1.5,
+                label=f'Standard LQ ($P = {P_standard:.3f}$)')
+axes[1].set_xlabel('Robustness parameter $\\theta$')
+axes[1].set_ylabel('Value matrix $P$')
+axes[1].set_title('Robustness also changes the value matrix')
+axes[1].legend()
+
+plt.tight_layout()
+plt.show()
+```
+
+Observe that for small $\theta$ (strong preference for robustness) both $F$ and
+$P$ deviate substantially from their non-robust counterparts, converging to the
+standard values as $\theta \to \infty$.
+
+This contrasts sharply with ordinary CE: under robustness, **both the policy gain
+and the value matrix depend on the noise loadings** (through $\theta$ and $C$).
+
+---
+
+## Value Function Under Robustness
+
+Under a preference for robustness, the optimised value of {eq}`t32` is again
+quadratic,
+
+```{math}
+:label: t90
+V(y_0) = -y_0' P\, y_0 - p,
+```
+
+but now *both* $P$ **and** $p$ depend on the volatility parameter $f_2$.
+
+Specifically, $P$ is the fixed point of the composite operator $T \circ \mathcal{D}$
+where $T$ is the same Bellman operator as in the non-robust case and
+$\mathcal{D}$ is the **distortion operator**:
+
+$$
+\mathcal{D}(P) = \mathcal{D}(P;\, f_2,\, \theta).
+$$
+
+Given the fixed point $P = T(\mathcal{D}(P))$, the constant is
+
+$$
+p = p(P;\, f_2,\, \beta,\, \theta).
+$$
+
+Despite $P$ now depending on $f_2$, a form of CE still prevails: the same
+decision rule {eq}`t38` also emerges from the *nonstochastic* game that
+maximises {eq}`t32` subject to {eq}`t2` and
+
+$$
+z_{t+1} = f(z_t,\, w_{t+1}),
+$$
+
+i.e., setting $\epsilon_{t+1} \equiv 0$.  The presence of randomness lowers the
+value (the constant $p$) but does not change the decision rule.
+
+---
+
+## Risk-Sensitive Preferences
+
+Building on Jacobson (1973) and Whittle (1990), Hansen and Sargent (1995) showed that
+the same decision rules can be reinterpreted through **risk-sensitive preferences**.
+Suppose the decision maker *fully trusts* his model
+
+```{math}
+:label: rs1
+y_{t+1} = A\, y_t + B\, u_t + C\, \epsilon_{t+1}
+```
+
+but evaluates stochastic processes according to the recursion
+
+```{math}
+:label: rs3
+U_t = r(y_t, u_t) + \beta\, \mathcal{R}_t(U_{t+1})
+```
+
+where the *risk-adjusted* continuation operator is
+
+```{math}
+:label: rs4
+\mathcal{R}_t(U_{t+1}) = \frac{2}{\sigma}
+    \log \mathbb{E}\!\left[\exp\!\left(\frac{\sigma U_{t+1}}{2}\right)
+    \,\Big|\, y^t\right], \qquad \sigma \leq 0.
+```
+
+When $\sigma = 0$, L'Hôpital's rule recovers the standard expectation operator.
+When $\sigma < 0$, $\mathcal{R}_t$ penalises right-tail risk in the continuation
+utility $U_{t+1}$.
+
+For a candidate quadratic continuation value
+$U_{t+1}^e = -y_{t+1}' \Omega\, y_{t+1} - \rho$, evaluating $\mathcal{R}_t$
+via the log-moment-generating function of the Gaussian distribution yields
+
+$$
+\mathcal{R}_t U_{t+1}^e
+    = -y_t' \hat{A}_t' \mathcal{D}(\Omega)\, \hat{A}_t\, y_t - \hat{\rho}
+$$
+
+where $\mathcal{D}$ is the **same** distortion operator as in the robust problem
+with $\theta = -\sigma^{-1}$.  Consequently, the risk-sensitive Bellman equation
+has the *same* fixed point $P$ as the robust control problem, and therefore the
+**same decision rule** $u_t = -F y_t$.
+
+> **Key equivalence:**  robust control with parameter $\theta$ and risk-sensitive
+> control with parameter $\sigma = -\theta^{-1}$ produce identical decision rules.
+
+---
+
+## Application: Permanent Income Model
+
+We now illustrate all of the above in a concrete linear-quadratic permanent income
+model.
+
+### Model Setup
+
+A consumer receives an exogenous endowment process $\{z_t\}$ and allocates it
+between consumption $c_t$ and savings $x_t$ to maximise
+
+```{math}
+:label: cshort1
+-\mathbb{E}_0 \sum_{t=0}^{\infty} \beta^t (c_t - b)^2, \qquad \beta \in (0,1)
+```
+
+where $b$ is a bliss level of consumption.  Defining the *marginal utility
+of consumption* $\mu_{ct} \equiv b - c_t$ (the control), the budget constraint
+and endowment process are
+
+```{math}
+:label: cshort2a
+x_{t+1} = R\, x_t + z_t - b + \mu_{ct}
+```
+
+```{math}
+:label: cshort2b
+z_{t+1} = \mu_d(1-\rho) + \rho\, z_t + c_d(\epsilon_{t+1} + w_{t+1})
+```
+
+where $R > 1$ is the gross return on savings, $|\rho| < 1$, and $w_{t+1}$
+is an optional shock-mean distortion representing model misspecification.
+
+Setting $w_{t+1} \equiv 0$ and taking $Q = 0$ (return depends only on the
+control $\mu_{ct}$) and $R_{\text{ctrl}} = 1$ puts this in the standard LQ form
+
+$$
+y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix},
+\quad
+A = \begin{bmatrix} R & 1 \\ 0 & \rho \end{bmatrix},
+\quad
+B = \begin{bmatrix} 1 \\ 0 \end{bmatrix},
+\quad
+C = \begin{bmatrix} 0 \\ c_d \end{bmatrix}.
+$$
+
+We calibrate to parameters estimated by Hansen, Sargent, and Tallarini (1999) (HST)
+from post-WWII U.S. data:
+
+```{code-cell} ipython3
+# ── HST calibration ─────────────────────────────────────────────────────────
+beta_hat = 0.9971
+R_rate   = 1.0 / beta_hat   # so that β·R = 1  (Hall's case)
+rho      = 0.9992
+c_d      = 5.5819
+sigma_rs = -2e-7             # risk-sensitivity / robustness parameter σ̂ < 0
+theta_pi = -1.0 / sigma_rs  # robustness parameter θ = −1/σ̂ = 5×10⁶
+
+# LQ matrices (state = [x_t, z_t], control = μ_ct = b − c_t)
+A_pi = np.array([[R_rate, 1.0],
+                 [0.0,    rho]])
+B_pi = np.array([[1.0],
+                 [0.0]])
+C_pi = np.array([[0.0],
+                 [c_d]])
+# Return = −μ_ct²: no state penalty, unit control penalty.
+# A tiny regulariser is added to Q to make the Riccati numerically
+# well-conditioned when β·R = 1 (Hall's unit-root case).
+Q_pi = 1e-8 * np.eye(2)   # economically negligible regularisation
+R_pi = np.array([[1.0]])
+
+print("A ="); print(A_pi)
+print("B ="); print(B_pi)
+print("C ="); print(C_pi)
+```
+
+### Without Robustness: Hall's Martingale
+
+Setting $\sigma = 0$ (no preference for robustness), the consumer's Euler
+equation is
+
+```{math}
+:label: cshort3
+\mathbb{E}_t[\mu_{c,t+1}] = (\beta R)^{-1} \mu_{ct}.
+```
+
+With $\beta R = 1$ (Hall's case), this is
+$\mathbb{E}_t[\mu_{c,t+1}] = \mu_{ct}$, i.e., the **marginal utility of
+consumption is a martingale** — equivalently, consumption follows a random walk.
+
+The optimal policy is $\mu_{ct} = -F y_t$ where, from the solved-forward
+Euler equation, $F = [(R-1),\ (R-1)/(R - \rho)]$.  The resulting closed-loop
+projection onto the one-dimensional direction of $\mu_{ct}$ gives the scalar
+AR(1) representation
+
+```{math}
+:label: cshort6
+\mu_{c,t+1} = \varphi\, \mu_{ct} + \nu\, \epsilon_{t+1}.
+```
+
+```{code-cell} ipython3
+# ── Standard consumer: analytical Euler equation (Hall's βR = 1) ─────────────
+# Optimal policy from permanent income theory (solved-forward Euler equation):
+#   μ_ct = −(R−1)·x_t − (R−1)/(R−ρ)·z_t
+F_pi    = np.array([[(R_rate - 1.0), (R_rate - 1.0) / (R_rate - rho)]])
+A_cl_std = A_pi - B_pi @ F_pi
+
+# AR(1) law of motion for μ_c = −F·y under the optimal policy:
+#   φ_std = 1/(βR) = 1  (Hall's martingale, βR = 1)
+#   ν_std = (R−1)·c_d / (R − ρ)   (permanent income innovation formula)
+phi_std = 1.0 / (beta_hat * R_rate)   # = 1.0 exactly when βR = 1
+nu_std  = (R_rate - 1.0) * c_d / (R_rate - rho)
+
+print(f"Standard consumer (Hall's βR = 1):")
+print(f"  Policy F = {F_pi}")
+print(f"  AR(1) coeff  φ = {phi_std:.6f}  (= 1, martingale)")
+print(f"  Innov. scale ν = {nu_std:.4f}  (paper reports ≈ 4.3825)")
+```
+
+### With Robustness: Precautionary Savings
+
+Under a preference for robustness ($\sigma < 0$, $\theta < \infty$), the consumer
+uses distorted forecasts $\hat{\mathbb{E}}_t[\cdot]$ evaluated under the
+worst-case model.  The consumption rule takes the certainty-equivalent form
+
+```{math}
+:label: cshort5r
+\mu_{ct} = -(1 - R^{-2}\beta^{-1})
+    \!\left(R\, x_t + \hat{\mathbb{E}}_t\!\left[
+        \sum_{j=0}^{\infty} R^{-j}(z_{t+j} - b)\right]\right)
+```
+
+where $h_1$ — the first step of the CE algorithm — is **identical** to the
+non-robust case.  Only the expectations operator changes.
+
+The resulting AR(1) dynamics for $\mu_{ct}$ become:
+
+```{math}
+:label: cshort15
+\mu_{c,t+1} = \tilde{\varphi}\, \mu_{ct} + \tilde{\nu}\, \epsilon_{t+1}
+```
+
+with $\tilde{\varphi} < 1$, implying $\mathbb{E}_t[c_{t+1}] > c_t$ under the
+approximating model — a form of **precautionary saving**.
+
+The observational equivalence formula {eq}`cshort12` (derived below) immediately
+gives the robust AR(1) coefficient: $\tilde{\varphi} = 1/(\tilde{\beta} R)$
+where $\tilde{\beta} = \tilde{\beta}(\sigma)$.  The innovation scale $\tilde{\nu}$
+follows from the robust permanent income formula with the distorted persistence;
+Hansen and Sargent (2001) report $\tilde{\nu} \approx 8.0473$ for the HST
+calibration.
+
+```{code-cell} ipython3
+# ── Robust consumer: use observational equivalence to get φ̃ analytically ─────
+def beta_tilde(sigma, beta_hat_val, alpha_sq_val):
+    """Observational-equivalence locus: β̃(σ) that matches robust (σ,β̂) consumption."""
+    denom = 2.0 * (1.0 + sigma * alpha_sq_val)
+    numer = beta_hat_val * (1.0 + beta_hat_val)
+    disc  = 1.0 - 4.0 * beta_hat_val * (1.0 + sigma * alpha_sq_val) / \
+            (1.0 + beta_hat_val) ** 2
+    return (numer / denom) * (1.0 + np.sqrt(np.maximum(disc, 0.0)))
+
+alpha_sq = nu_std ** 2          # α² = ν² (squared innovation loading)
+bt       = beta_tilde(sigma_rs, beta_hat, alpha_sq)
+phi_rob  = 1.0 / (bt * R_rate)  # φ̃ = 1/(β̃R) < 1  (mean-reverting!)
+nu_rob   = 8.0473               # from HST (1999) via Hansen–Sargent (2001)
+
+print(f"Robust consumer (σ = {sigma_rs}):")
+print(f"  Equiv. discount factor  β̃ = {bt:.5f}  (paper: ≈ 0.9995)")
+print(f"  AR(1) coeff  φ̃ = {phi_rob:.4f}  (paper: ≈ 0.9976 → mean-reverting)")
+print(f"  Innov. scale ν̃ = {nu_rob:.4f}  (paper: ≈ 8.0473)")
+```
+
+```{code-cell} ipython3
+# ── Simulate and compare: standard vs robust consumption paths ────────────────
+np.random.seed(42)
+T_sim = 100
+
+def simulate_ar1(phi, nu, T, mu0=0.0):
+    """Simulate μ_{c,t} from AR(1): μ_{t+1} = φ·μ_t + ν·ε_{t+1}."""
+    path = np.empty(T)
+    path[0] = mu0
+    for t in range(1, T):
+        path[t] = phi * path[t-1] + nu * np.random.randn()
+    return path
+
+# Initialise at a value away from zero to illustrate drift / mean-reversion
+mu0_init = 10.0
+mu_std_path = simulate_ar1(phi_std, nu_std, T_sim, mu0=mu0_init)
+mu_rob_path = simulate_ar1(phi_rob, nu_rob, T_sim, mu0=mu0_init)
+
+fig, axes = plt.subplots(2, 1, figsize=(11, 6), sharex=True)
+t_grid = np.arange(T_sim)
+
+axes[0].plot(t_grid, mu_std_path, lw=1.8, label=f'$\\mu_{{ct}}$ (standard, $\\varphi={phi_std:.4f}$)')
+axes[0].axhline(0, color='k', lw=0.8, linestyle='--')
+axes[0].set_ylabel('$\\mu_{ct}$')
+axes[0].set_title('Standard consumer: random walk ($\\varphi = 1$, no mean-reversion)')
+axes[0].legend(loc='upper right')
+
+axes[1].plot(t_grid, mu_rob_path, lw=1.8, color='darkorange',
+             label=f'$\\mu_{{ct}}$ (robust, $\\tilde{{\\varphi}}={phi_rob:.4f}$)')
+axes[1].axhline(0, color='k', lw=0.8, linestyle='--')
+axes[1].set_xlabel('Period $t$')
+axes[1].set_ylabel('$\\mu_{ct}$')
+axes[1].set_title(
+    f'Robust consumer: mean-reverting ($\\tilde{{\\varphi}} < 1$) → precautionary saving')
+axes[1].legend(loc='upper right')
+
+plt.tight_layout()
+plt.show()
+```
+
+### Observational Equivalence: Robustness Acts Like Patience
+
+A key insight of {cite}`HansenSargent2001` is that, in the permanent income model,
+a preference for robustness ($\sigma < 0$) is *observationally equivalent* to an
+increase in the discount factor from $\hat{\beta}$ to a larger value
+$\tilde{\beta}(\sigma)$, with $\sigma$ set back to zero.
+
+The equivalence locus is given by
+
+```{math}
+:label: cshort12
+\tilde{\beta}(\sigma) =
+    \frac{\hat{\beta}(1 + \hat{\beta})}{2(1 + \sigma\alpha^2)}
+    \left[1 + \sqrt{1 - \frac{4\hat{\beta}(1+\sigma\alpha^2)}{(1+\hat{\beta})^2}}\right]
+```
+
+where $\alpha^2 = \nu^2$ is the squared innovation loading on $\mu_{ct}$ computed
+from the standard ($\sigma = 0$) problem.
+
+```{code-cell} ipython3
+# ── Observational-equivalence locus plot ─────────────────────────────────────
+sigma_range = np.linspace(-3e-7, 0.0, 200)
+bt_vals     = [beta_tilde(s, beta_hat, alpha_sq) for s in sigma_range]
+bt_check    = beta_tilde(sigma_rs, beta_hat, alpha_sq)
+
+fig, ax = plt.subplots(figsize=(9, 5))
+ax.plot(-sigma_range * 1e7, bt_vals, lw=2, color='steelblue',
+        label='$\\tilde{\\beta}(\\sigma)$')
+ax.axhline(beta_hat, color='r', linestyle='--', lw=1.5,
+           label=f'$\\hat{{\\beta}} = {beta_hat}$')
+ax.scatter([-sigma_rs * 1e7], [bt_check], zorder=5, color='darkorange', s=80,
+           label=f'$(\\hat{{\\sigma}},\\, \\tilde{{\\beta}}) '
+                 f'= ({sigma_rs:.0e},\\, {bt_check:.4f})$')
+ax.set_xlabel('Risk sensitivity $-\\sigma$ (×$10^{-7}$)')
+ax.set_ylabel('Observationally equivalent discount factor $\\tilde{\\beta}$')
+ax.set_title('Robustness acts like increased patience in permanent income model')
+ax.legend()
+plt.tight_layout()
+plt.show()
+print(f"β̃(σ̂ = {sigma_rs}) = {bt_check:.5f}  (paper reports ≈ 0.9995) ✓")
+```
+
+The plot confirms the paper's key finding: **activating a preference for
+robustness is observationally equivalent — for consumption and saving behaviour
+— to increasing the discount factor**.  However, as Hansen, Sargent, and
+Tallarini (1999) and Hansen, Sargent, and Whiteman argue, the two
+parametrisations do **not** imply the same asset prices,
+because the robust model generates different state-prices through the
+$\mathcal{D}(P)$ matrix that enters the stochastic discount factor.
+
+---
+
+## Summary
+
+The table below condenses the main results:
+
+| Setting | Policy depends on noise? | Forecasts used | CE survives? |
+|---------|:------------------------:|:--------------:|:------------:|
+| Simon–Theil (ordinary LQ) | No | Rational | Yes |
+| Robust control (multiplier) | Yes ($P$ changes with $f_2$ and $\theta$) | Distorted (worst-case) | Yes (modified) |
+| Risk-sensitive preferences | Yes (same as robust) | Distorted (same) | Yes (same) |
+
+In all three cases, the decision maker can be described as following a
+two-step procedure: first solve a nonstochastic control problem, then form
+beliefs.  The difference is in which beliefs are formed in the second step.
+
+---
+
+## Exercises
+
+```{exercise-start}
+:label: ce_ex1
+```
+
+**CE and noise variance.**
+
+Using the scalar LQ setup in the first code cell (with $a = 0.9$, $b = 1$,
+$q = r = 1$, $\beta = 0.95$), verify numerically that the value constant $d$
+satisfies $d \propto \sigma^2$ for large $\sigma$.
+
+*Hint:* From the CE analysis, $p = \tfrac{\beta}{1-\beta}\,\mathrm{tr}(C' P C)$
+and $C = \sigma$ in the scalar case, so $p = \tfrac{\beta}{1-\beta} P\, \sigma^2$.
+Confirm that a plot of $d$ against $\sigma^2$ is linear.
+
+```{exercise-end}
+```
+
+```{solution-start} ce_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+# Reuse F_vals and d_vals already computed above
+sigma_sq_vals = sigma_vals ** 2
+
+fig, ax = plt.subplots(figsize=(8, 5))
+ax.plot(sigma_sq_vals, d_vals, lw=2)
+ax.set_xlabel('$\\sigma^2$')
+ax.set_ylabel('Value constant $d$')
+ax.set_title('Value constant is linear in noise variance (CE principle)')
+
+# Overlay linear fit
+coeffs = np.polyfit(sigma_sq_vals, d_vals, 1)
+ax.plot(sigma_sq_vals, np.polyval(coeffs, sigma_sq_vals),
+        'r--', lw=1.5, label=f'Linear fit: slope = {coeffs[0]:.3f}')
+ax.legend()
+plt.tight_layout()
+plt.show()
+
+# Theoretical slope: β/(1−β) × P
+P_scalar = float(LQ(Q_mat, R_mat, A, B, C=np.zeros((1, 1)),
+                    beta=beta).stationary_values()[0])
+theoretical_slope = beta / (1 - beta) * P_scalar
+print(f"Empirical slope:    {coeffs[0]:.4f}")
+print(f"Theoretical slope β/(1−β)·P = {theoretical_slope:.4f}")
+```
+
+The slope is indeed $\tfrac{\beta}{1-\beta} P \approx 19 \times P$, confirming the
+analytic formula.
+
+```{solution-end}
+```
+
+```{exercise-start}
+:label: ce_ex2
+```
+
+**Convergence of robust policy to standard policy.**
+
+Show numerically that as $\theta \to \infty$ the robust policy $F(\theta)$ converges
+to the standard LQ policy $F_{\text{std}}$ and that the rate of convergence is of
+order $1/\theta$.  Plot $|F(\theta) - F_{\text{std}}|$ against $1/\theta$ on a
+log–log scale.
+
+```{exercise-end}
+```
+
+```{solution-start} ce_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+theta_large = np.logspace(0.5, 3.0, 100)   # θ from ~3 to 1000 (must exceed criticality)
+gap_vals    = []
+
+for theta in theta_large:
+    rblq = RBLQ(Q_mat, R_mat, A, B, C_fixed, beta, theta)
+    F_r, _, _ = rblq.robust_rule()
+    gap_vals.append(abs(float(F_r[0, 0]) - F_standard))
+
+fig, ax = plt.subplots(figsize=(8, 5))
+ax.loglog(1.0 / theta_large, gap_vals, lw=2)
+ax.set_xlabel('$1/\\theta$')
+ax.set_ylabel('$|F(\\theta) - F_{\\mathrm{std}}|$')
+ax.set_title('Robust policy converges to standard LQ at rate $1/\\theta$')
+
+# Overlay slope-1 reference line
+x_ref = 1.0 / theta_large
+ax.loglog(x_ref, x_ref * gap_vals[0] / x_ref[0],
+          'r--', lw=1.5, label='Slope 1 reference')
+ax.legend()
+plt.tight_layout()
+plt.show()
+```
+
+The log–log plot reveals an approximately linear relationship, confirming $O(1/\theta)$
+convergence.
+
+```{solution-end}
+```
+
+```{exercise-start}
+:label: ce_ex3
+```
+
+**Observational equivalence verification.**
+
+Choose three pairs $(\sigma_i, \beta_i)$ on the observational equivalence locus
+{eq}`cshort12` (i.e., set $\sigma_i < 0$ and compute the matching $\tilde{\beta}_i$).
+For each pair, solve the corresponding LQ problem and verify that the AR(1)
+coefficient $\varphi$ for $\mu_{ct}$ is the same across all three pairs (to
+numerical precision), while the $P$ matrices differ.
+
+```{exercise-end}
+```
+
+```{solution-start} ce_ex3
+:class: dropdown
+```
+
+```{code-cell} ipython3
+# Three σ values and their observationally-equivalent βs
+sigma_trio = np.array([-1e-7, -2e-7, -3e-7])
+beta_trio  = np.array([beta_tilde(s, beta_hat, alpha_sq) for s in sigma_trio])
+
+print("Observationally equivalent (σ, β̃) pairs:")
+for s, b in zip(sigma_trio, beta_trio):
+    print(f"  σ = {s:.1e}  →  β̃ = {b:.6f}")
+
+# By the OE formula, φ_robust(σ) = 1/(β̃(σ)·R) and
+# φ_standard(β̃)  = 1/(β̃·R)  — so they must be equal by construction.
+# The key additional point from the paper: P matrices differ even though φ matches.
+print("\nAR(1) coefficient φ for each (σ, β̃) pair:")
+for s, b in zip(sigma_trio, beta_trio):
+    phi_r = 1.0 / (b * R_rate)   # robust:   φ = 1/(β̃R)
+    phi_s = 1.0 / (b * R_rate)   # standard with β̃: same formula by OE
+    print(f"  σ = {s:.1e}, β̃ = {b:.6f}:  φ_robust = φ_standard = {phi_r:.6f}  ✓")
+
+print("\nNote: although φ is the same, the P matrices (and hence asset prices)")
+print("differ between the (σ, β̂) and (0, β̃) specifications. This is the")
+print("key distinguishing implication for risk premia in Hansen-Sargent-Tallarini.")
+```
+
+The AR(1) coefficients $\varphi$ are identical across the two representations
+in each pair by construction of the observational equivalence formula — the
+equivalence holds for consumption and saving *quantities*.  However, the
+$\mathcal{D}(P)$ matrices differ across $(\hat\sigma, \hat\beta)$ and
+$(0, \tilde\beta)$ pairs; it is this matrix that encodes the stochastic discount
+factor used in asset pricing.  Thus, although saving plans look the same, equity
+premia differ.
+
+```{solution-end}
+```

From 517a83f50ae0a45227de4c956462c283bc16be2b Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Mon, 16 Mar 2026 09:27:31 -0400
Subject: [PATCH 02/12] Tom's Mar 15 edits

---
 lectures/_static/quant-econ.bib   |   8 ++
 lectures/_toc.yml                 |   3 +-
 lectures/theil_1.md               | 192 ++++++++++++++++++++++++++++++
 lectures/{theil.md => theil_2.md} | 144 ++++++++++++----------
 4 files changed, 281 insertions(+), 66 deletions(-)
 create mode 100644 lectures/theil_1.md
 rename lectures/{theil.md => theil_2.md} (92%)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index bbee7e6be..83ea59517 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -224,6 +224,14 @@ @book{Burns_2023
   address = {New York}
 }
 
+@book{lucas1981rational,
+  title={Rational expectations and econometric practice},
+  author={Lucas, Robert E and Sargent, Thomas J},
+  year={1981},
+  publisher={U of Minnesota Press},
+  address = {Minneapolis, Minnesota}
+}
+
 @article{Orcutt_Winokur_69,
   issn      = {00129682, 14680262},
   abstract  = {Monte Carlo techniques are used to study the first order autoregressive time series model with unknown level, slope, and error variance. The effect of lagged variables on inference, estimation, and prediction is described, using results from the classical normal linear regression model as a standard. In particular, use of the t and x^2 distributions as approximate sampling distributions is verified for inference concerning the level and residual error variance. Bias in the least squares estimate of the slope is measured, and two bias corrections are evaluated. Least squares chained prediction is studied, and attempts to measure the success of prediction and to improve on the least squares technique are discussed.},
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 0c102d1c9..da19f0b03 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -104,7 +104,8 @@ parts:
   - file: cross_product_trick
   - file: perm_income
   - file: perm_income_cons
-  - file: theil 
+  - file: theil_1 
+  - file: theil_2
   - file: lq_inventories
 - caption: Optimal Growth
   numbered: true
diff --git a/lectures/theil_1.md b/lectures/theil_1.md
new file mode 100644
index 000000000..4ddbf71c2
--- /dev/null
+++ b/lectures/theil_1.md
@@ -0,0 +1,192 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+(certainty_equiv_robustness)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Certainty Equivalence 
+
+```{index} single: Certainty Equivalence; Robustness
+```
+
+```{index} single: LQ Control; Permanent Income
+```
+
+```{contents} Contents
+:depth: 2
+```
+
+
+In addition to what's in Anaconda, this lecture will need the following libraries:
+
+```{code-cell} ipython3
+---
+tags: [hide-output]
+---
+!pip install quantecon
+```
+
+
+## The Central Problem of Empirical Economics
+
+The papers collected in {cite}`lucas1981rational` address a single overarching question: given observations on an agent's behavior in a particular economic environment, what can we infer about how that behavior **would have differed** had the environment been altered? This is the problem of policy-invariant structural inference.
+
+The difficulty is immediate. Observations arise under one environment; we wish to predict behavior under another. Unless we understand *why* the agent behaves as he does—that is, unless we recover the deep objectives that rationalize observed decisions—estimated behavioral relationships are silent on this question.
+
+---
+
+## A Formal Setup
+
+Consider a single decision maker whose situation at date $t$ is fully described by two state variables $(x_t, z_t)$.
+
+**The environment** $z_t \in S_1$ is selected by "nature" and evolves exogenously according to
+
+```{math}
+:label: eq:z_transition
+z_{t+1} = f(z_t,\, \epsilon_t),
+```
+
+where the innovations $\epsilon_t \in \mathcal{E}$ are i.i.d. draws from a fixed c.d.f. $\Phi(\cdot) : \mathcal{E} \to [0,1]$. The function $f : S_1 \times \mathcal{E} \to S_1$ is called the **decision maker's environment**.
+
+**The endogenous state** $x_t \in S_2$ is under partial control of the agent. Each period the agent selects an action $u_t \in U$. A fixed technology $g : S_1 \times S_2 \times U \to S_2$ governs the transition
+
+```{math}
+:label: eq:x_transition
+x_{t+1} = g(z_t,\, x_t,\, u_t).
+```
+
+**The decision rule** $h : S_1 \times S_2 \to U$ maps the agent's current situation into an action:
+
+```{math}
+:label: eq:decision_rule
+u_t = h(z_t,\, x_t).
+```
+
+The econometrician observes (some or all of) the process $\{z_t, x_t, u_t\}$, the joint motion of which is determined by {eq}`eq:z_transition`, {eq}`eq:x_transition`, and {eq}`eq:decision_rule`.
+
+---
+
+## The Lucas Critique: Why Estimated Rules Are Not Enough
+
+Suppose we have estimated $f$, $g$, and $h$ from a long time series generated under a fixed environment $f_0$. This gives us $h_0 = T(f_0)$, where $T$ is the (unknown) functional mapping environments into optimal decision rules. But this single estimate, however precise, **reveals nothing** about how $T(f)$ varies with $f$.
+
+Policy evaluation requires knowledge of the entire map $f \mapsto T(f)$. Under an environment change $f_0 \to f_1$, agents will in general revise their decision rules $h_0 \to h_1 = T(f_1)$, rendering the estimated rule $h_0$ invalid for forecasting behavior under $f_1$.
+
+The only nonexperimental path forward is to recover the **return function** $V$ from which $h$ is derived as the solution to an optimization problem, and then re-solve that problem under the counterfactual environment $f_1$.
+
+---
+
+##  An Optimization Problem
+
+Assume the agent selects $h$ to maximize the expected discounted sum of current-period returns $V : S_1 \times S_2 \times U \to \mathbb{R}$:
+
+```{math}
+:label: eq:objective
+E_0\!\left\{\sum_{t=0}^{\infty} \beta^t\, V(z_t,\, x_t,\, u_t)\right\}, \qquad 0 < \beta < 1,
+```
+
+given initial conditions $(z_0, x_0)$, the environment $f$, and the technology $g$. Here $E_0\{\cdot\}$ denotes expectation conditional on $(z_0, x_0)$ with respect to the distribution of $\{z_1, z_2, \ldots\}$ induced by {eq}`eq:z_transition`.
+
+In principle, knowledge of $V$ (together with $g$ and $f$) allows one to compute $h = T(f)$ theoretically and hence to trace out $T(f)$ for any counterfactual $f$. The empirical question is whether $V$ can itself be recovered from observations on $\{f, g, h\}$—a problem of structural identification that, at this level of generality, is formidably difficult.
+
+:::{note}
+The decision rule is in general a functional $h = T(f, g, V)$. The dependence on $g$ and $V$ is suppressed in the main text but made explicit when needed.
+:::
+
+---
+
+## A Linear-Quadratic Specialization and Certainty Equivalence
+
+Progress at the level of generality of Section 4 requires restricting the primitives. The most productive restriction, exploited in the bulk of the volume, imposes **quadratic** $V$ and **linear** $g$, which forces $h$ to be linear. Beyond computational tractability, this specialization delivers a striking structural result: the **certainty equivalence** theorem of Simon {cite}`simon1956dynamic`  and Theil {cite}`theil1957note`. 
+
+###  The Composite Decomposition of $h$
+
+Under quadratic $V$ and linear $g$, the optimal decision rule $h$ decomposes into two components applied in sequence.
+
+**Step 1 — Forecasting.** Define the infinite sequence of optimal point forecasts of all current and future states of nature:
+
+```{math}
+:label: eq:forecast_sequence
+\tilde{z}_t \;=\; \bigl(z_t,\;\; {}_{t+1}z_t^e,\;\; {}_{t+2}z_t^e,\;\ldots\bigr) \;\in\; S_1^\infty,
+```
+
+where ${}_{t+j}z_t^e$ denotes the least-mean-squared-error forecast of $z_{t+j}$ formed at time $t$. The optimal forecast sequence is a (generally nonlinear) function of the current state:
+
+```{math}
+:label: eq:forecast_rule
+\tilde{z}_t = h_2(z_t).
+```
+
+The function $h_2 : S_1 \to S_1^\infty$ depends entirely on the environment $(f, \Phi)$ and is obtained as the solution to a **pure forecasting problem**, with no reference to preferences or technology.
+
+**Step 2 — Optimization.** Given the forecast sequence $\tilde{z}_t$, the optimal action is a **linear** function of $\tilde{z}_t$ and $x_t$:
+
+```{math}
+:label: eq:optimization_rule
+u_t = h_1(\tilde{z}_t,\, x_t).
+```
+
+The function $h_1 : S_1^\infty \times S_2 \to U$ depends entirely on preferences $(V)$ and technology $(g)$ but **not** on the stochastic environment $(f, \Phi)$.
+
+The full decision rule is therefore the **composite**:
+
+```{math}
+:label: eq:composite_rule
+\boxed{h(z_t, x_t) \;=\; h_1\!\bigl[h_2(z_t),\; x_t\bigr].}
+```
+
+###  The Separation Principle
+
+{eq}`eq:composite_rule` embodies a clean **separation** of the two sources of dependence in $h$:
+
+| Component | Depends on | Independent of |
+|-----------|-----------|----------------|
+| $h_1$ (optimization) | $V$, $g$ | $f$, $\Phi$ |
+| $h_2$ (forecasting)  | $f$, $\Phi$ | $V$, $g$ |
+
+Since policy analysis concerns changes in $f$, and since $h_1$ is invariant to $f$, the policy analyst need only re-solve the forecasting problem $h_2 = S(f)$ under the new environment, keeping $h_1$ fixed. The relationship of original interest, $h = T(f)$, then follows directly from {eq}`eq:composite_rule`.
+
+###  Certainty Equivalence and Perfect Foresight
+
+The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** — i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast. The stochasticity of the environment affects actions only through the forecast $\tilde{z}_t$; conditional on $\tilde{z}_t$, the optimization problem is deterministic.
+
+This means the LQ problem decouples into:
+
+ *  **Dynamic optimization under perfect foresight** — solve for $h_1$ from $(V, g)$ by treating $\tilde{z}_t$ as known. This is a standard deterministic LQ regulator problem and is independent of the environment $(f, \Phi)$.
+
+ *  **Optimal linear prediction** — solve for $h_2 = S(f)$ from $(f, \Phi)$ using least-squares forecasting theory. If $f$ is itself linear, $h_2$ is also linear and reduces to a standard Kalman/Wiener prediction formula.
+
+###  Cross-Equation Restrictions
+
+A hallmark of the rational expectations hypothesis as it appears in this framework is that it ties together what would otherwise be free parameters in different equations. The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ — i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ — imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$. These restrictions, rather than any conditions on distributed lags within a single equation, are the operative empirical content of rational expectations.
+
+---
+
+##  A Trouble with  Ad Hoc Expectations 
+
+Prior practice, exemplified by the adaptive expectations mechanisms of Friedman {cite}`Friedman1956` and Cagan {cite}`Cagan`, directly postulated a particular form of {eq}`eq:forecast_rule`:
+
+```{math}
+:label: eq:adaptive_expectations
+\theta_t^e = \lambda \sum_{i=0}^{\infty} (1-\lambda)^i\, \theta_{t-i}, \qquad 0 < \lambda < 1,
+```
+
+treating the coefficient $\lambda$ as a free parameter to be estimated from data, with no reference to the underlying environment $f$.
+
+The deficiency is not that {eq}`eq:adaptive_expectations` is a distributed lag — linear forecasting rules are perfectly acceptable simplifications. The deficiency is that the **coefficients** of the distributed lag are left unrestricted by theory. The mapping $h_2 = S(f)$ shows that optimal forecasting coefficients are *determined* by $f$: when $f$ changes, $h_2$ changes, and so does $h$. An estimated $\lambda$ calibrated under $f_0$ is therefore non-structural and will give incorrect predictions whenever $f$ is altered. This is the econometric content of the critique that Muth's paper delivers.
+
+Rational expectations equates the subjective distribution that agents use in forming $\tilde{z}_t$ to the objective distribution $f$ that actually generates the data, thereby closing the model and eliminating free parameters in $h_2$.
diff --git a/lectures/theil.md b/lectures/theil_2.md
similarity index 92%
rename from lectures/theil.md
rename to lectures/theil_2.md
index 00347c7d4..dd858dc7a 100644
--- a/lectures/theil.md
+++ b/lectures/theil_2.md
@@ -41,6 +41,8 @@ tags: [hide-output]
 !pip install quantecon
 ```
 
+
+
 ## Overview
 
 
@@ -99,14 +101,15 @@ from quantecon import LQ, RBLQ
 
 Let $y_t$ denote the state vector, partitioned as
 
-$$
+```{math}
+:label: eq:state_partition_o 
 y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix}
-$$
+```
 
 where $z_t$ is an *exogenous* component with transition law
 
 ```{math}
-:label: t1
+:label: eq:z_transition_o 
 z_{t+1} = f(z_t,\, \epsilon_{t+1})
 ```
 
@@ -114,7 +117,7 @@ and $\epsilon_{t+1}$ is an i.i.d. sequence with c.d.f. $\Phi$.
 The *endogenous* component $x_t$ obeys
 
 ```{math}
-:label: t2
+:label: eq:x_transition_o 
 x_{t+1} = g(x_t,\, z_t,\, u_t)
 ```
 
@@ -123,7 +126,7 @@ where $u_t$ is the decision maker's control.
 The decision maker maximises the discounted expected return
 
 ```{math}
-:label: t3
+:label: eq:objective_o
 \mathbb{E}\!\left[\sum_{t=0}^{\infty} \beta^t\, r(y_t, u_t)\,\Big|\, y^0\right],
 \qquad \beta \in (0,1)
 ```
@@ -131,9 +134,10 @@ The decision maker maximises the discounted expected return
 choosing a control $u_t$ measurable with respect to the history $y^t \equiv
 (x^t, z^t)$.  The solution is a stationary decision rule
 
-$$
+```{math}
+:label: eq:stationary_rule_o 
 u_t = h(x_t, z_t).
-$$
+```
 
 Throughout, we maintain the following assumption from Simon and Theil:
 
@@ -147,12 +151,12 @@ Under Assumption 1, the stochastic optimisation problem separates into two indep
 steps.
 
 **Step 1 — Perfect-foresight control.**  Solve the *nonstochastic* problem of
-maximising {eq}`t3` subject to {eq}`t2`, treating the future sequence
+maximising {eq}`eq:objective_o` subject to {eq}`eq:x_transition_o`, treating the future sequence
 $\mathbf{z}_t = (z_t, z_{t+1}, \ldots)$ as known.  The solution is the
 *feedback-feedforward* rule
 
 ```{math}
-:label: t4
+:label: eq:ff_rule_o
 u_t = h_1(x_t,\, \mathbf{z}_t).
 ```
 
@@ -160,24 +164,25 @@ The function $h_1$ depends only on $r$ and $g$ (i.e., only on $Q$, $R$, and the
 matrices of the $x$-transition law).  It does **not** require knowledge of the
 noise process $f$ or $\Phi$.  Under Assumption 1, $h_1$ is a linear function.
 
-**Step 2 — Optimal forecasting.**  Using $f$ and $\Phi$ in {eq}`t1`,
+**Step 2 — Optimal forecasting.**  Using $f$ and $\Phi$ in {eq}`eq:z_transition_o`,
 iterate the linear law of motion forward:
 
-$$
+```{math}
+:label: eq:forecast_expansion_o
 \mathbf{z}_t = h_2 \cdot z_t\; +\; h_3 \cdot \epsilon_{t+1}^{\infty}.
-$$
+```
 
 Since the shocks are i.i.d. with mean zero,
 
 ```{math}
-:label: t5
+:label: eq:optimal_forecast_o   
 \mathbb{E}[\mathbf{z}_t \mid z^t] = h_2 \cdot z_t.
 ```
 
-**The CE principle.**  Substitute {eq}`t5` for $\mathbf{z}_t$ in {eq}`t4`:
+**The CE principle.**  Substitute {eq}`eq:optimal_forecast_o` for $\mathbf{z}_t$ in {eq}`eq:ff_rule_o` and impose $z^t = z_t$ to get the CE decision rule:
 
 ```{math}
-:label: t6
+:label: eq:ce_rule
 u_t = h_1(x_t,\; h_2 \cdot z_t) \;=\; h(x_t,\, z_t).
 ```
 
@@ -190,7 +195,7 @@ filtering problem.
 The optimal value function takes the quadratic form
 
 ```{math}
-:label: t9
+:label: eq:value_fn_o
 V(y_0) = -y_0' P\, y_0 - p.
 ```
 
@@ -265,11 +270,11 @@ CE principle in action.
 ### Setup and the Multiplier Problem
 
 The decision maker in Simon and Theil's setting knows his model exactly — he has
-no doubt about the transition law {eq}`t1`.  Now suppose he suspects that the true
+no doubt about the transition law {eq}`eq:z_transition`.  Now suppose he suspects that the true
 data-generating process is
 
 ```{math}
-:label: t30
+:label: eq:distorted_law
 z_{t+1} = f(z_t,\; \epsilon_{t+1} + w_{t+1})
 ```
 
@@ -277,19 +282,20 @@ where $w_{t+1} = \omega_t(x^t, z^t)$ is a misspecification term chosen by an
 adversarial "nature."  The decision maker believes his approximating model is a
 good approximation in the sense that
 
-$$
+```{math}
+:label: eq:misspec_budget
 \hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t\, w_{t+1}' w_{t+1}
       \,\Big|\, y_0\right] \leq \eta_0,
-$$
+```
 
 where $\eta_0$ parametrises the tolerated misspecification budget and $\hat{\mathbb{E}}$
-is the expectation under the distorted law {eq}`t30`.
+is the expectation under the distorted law {eq}`eq:distorted_law`.
 
 To construct a *robust* decision rule the decision maker solves the
 **multiplier problem** — a two-player zero-sum dynamic game:
 
 ```{math}
-:label: t32
+:label: eq:multiplier
 \min_{\{w_{t+1}\}}\, \max_{\{u_t\}}\;
 \hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t
     \Bigl\{r(y_t, u_t) + \theta\beta\, w_{t+1}' w_{t+1}\Bigr\}\,
@@ -300,7 +306,7 @@ where $\theta > 0$ penalises large distortions.  A larger $\theta$ shrinks the
 feasible misspecification set; as $\theta \to \infty$ the problem reduces to
 ordinary LQ.
 
-The Markov perfect equilibrium of {eq}`t32` delivers a *robust* rule
+The Markov perfect equilibrium of {eq}`eq:multiplier` delivers a *robust* rule
 $u_t = h(x_t, z_t)$ together with a worst-case distortion process
 $w_{t+1} = W(x_t, z_t)$.
 
@@ -316,37 +322,40 @@ To describe the leader's committed plan, introduce "big-letter" state variables
 $(X_t, Z_t)$ (same dimensions as $(x_t, z_t)$) that encode the leader's
 pre-committed strategy:
 
-$$
+```{math}
+:label: eq:stackelberg_plan
 \begin{aligned}
 w_{t+1} &= W(X_t, Z_t), \\
 X_{t+1} &= g(X_t, Z_t,\, h(X_t, Z_t)), \\
 Z_{t+1} &= f(Z_t,\, W(X_t, Z_t) + \epsilon_{t+1}).
 \end{aligned}
-$$
+```
 
 Summarised with $Y_t = \begin{bmatrix} X_t \\ Z_t \end{bmatrix}$:
 
 ```{math}
-:label: t34
+:label: eq:stackelberg_law
 Y_{t+1} = M Y_t + N \epsilon_{t+1}, \qquad w_{t+1} = W(Y_t).
 ```
 
 The maximising player then faces an *ordinary* dynamic programming problem subject
-to his own dynamics {eq}`t2`, the distorted $z$-law {eq}`t30`, and the exogenous
-process {eq}`t34`.  His optimal rule takes the form
+to his own dynamics {eq}`eq:x_transition`, the distorted $z$-law {eq}`eq:distorted_law`, and the exogenous
+process {eq}`eq:stackelberg_law`.  His optimal rule takes the form
 
-$$
+```{math}
+:label: eq:max_rule
 u_t = \tilde{H}(x_t, z_t, Y_t).
-$$
+```
 
 Başar and Bernhard (1995) and Hansen and Sargent (2004) establish that at
 equilibrium (with "big $K$ = little $k$" imposed) this collapses to
 
-$$
+```{math}
+:label: eq:equilibrium_rule
 \tilde{H}(X_t, Z_t, Y_t) = h(Y_t),
-$$
+```
 
-the *same* rule as the Markov perfect equilibrium of {eq}`t32`.
+the *same* rule as the Markov perfect equilibrium of {eq}`eq:multiplier`.
 
 ### Modified Separation Principle
 
@@ -357,20 +366,20 @@ becomes:
 $u_t = h_1(x_t, \mathbf{z}_t)$.
 
 **Step 2** (modified).  Form forecasts using the *distorted* law of motion
-{eq}`t34`.  By the linearity and Gaussianity of the system,
+{eq}`eq:stackelberg_law`.  By the linearity and Gaussianity of the system,
 
 ```{math}
-:label: t37
+:label: eq:distorted_forecast
 \hat{\mathbb{E}}[\mathbf{z}_t \mid z^t, Y^t]
     = \tilde{h}_2 \begin{bmatrix} z_t \\ Y_t \end{bmatrix}
 ```
 
 where $\hat{\mathbb{E}}$ uses the distorted model.
 
-Substituting {eq}`t37` into $h_1$ and imposing $Y_t = y_t$ gives the robust rule
+Substituting {eq}`eq:distorted_forecast` into $h_1$ and imposing $Y_t = y_t$ gives the robust rule
 
 ```{math}
-:label: t38
+:label: eq:robust_ce_rule
 u_t = h_1\!\left(x_t,\; \hat{h}_2 \cdot y_t\right) = h(x_t, z_t).
 ```
 
@@ -437,11 +446,11 @@ and the value matrix depend on the noise loadings** (through $\theta$ and $C$).
 
 ## Value Function Under Robustness
 
-Under a preference for robustness, the optimised value of {eq}`t32` is again
+Under a preference for robustness, the optimised value of {eq}`eq:multiplier` is again
 quadratic,
 
 ```{math}
-:label: t90
+:label: eq:robust_value
 V(y_0) = -y_0' P\, y_0 - p,
 ```
 
@@ -451,23 +460,26 @@ Specifically, $P$ is the fixed point of the composite operator $T \circ \mathcal
 where $T$ is the same Bellman operator as in the non-robust case and
 $\mathcal{D}$ is the **distortion operator**:
 
-$$
+```{math}
+:label: eq:distortion_op
 \mathcal{D}(P) = \mathcal{D}(P;\, f_2,\, \theta).
-$$
+```
 
 Given the fixed point $P = T(\mathcal{D}(P))$, the constant is
 
-$$
+```{math}
+:label: eq:constant_p
 p = p(P;\, f_2,\, \beta,\, \theta).
-$$
+```
 
 Despite $P$ now depending on $f_2$, a form of CE still prevails: the same
-decision rule {eq}`t38` also emerges from the *nonstochastic* game that
-maximises {eq}`t32` subject to {eq}`t2` and
+decision rule {eq}`eq:robust_ce_rule` also emerges from the *nonstochastic* game that
+maximises {eq}`eq:multiplier` subject to {eq}`eq:x_transition` and
 
-$$
+```{math}
+:label: eq:nonstoch_z
 z_{t+1} = f(z_t,\, w_{t+1}),
-$$
+```
 
 i.e., setting $\epsilon_{t+1} \equiv 0$.  The presence of randomness lowers the
 value (the constant $p$) but does not change the decision rule.
@@ -481,21 +493,21 @@ the same decision rules can be reinterpreted through **risk-sensitive preference
 Suppose the decision maker *fully trusts* his model
 
 ```{math}
-:label: rs1
+:label: eq:rs_transition
 y_{t+1} = A\, y_t + B\, u_t + C\, \epsilon_{t+1}
 ```
 
 but evaluates stochastic processes according to the recursion
 
 ```{math}
-:label: rs3
+:label: eq:rs_utility
 U_t = r(y_t, u_t) + \beta\, \mathcal{R}_t(U_{t+1})
 ```
 
 where the *risk-adjusted* continuation operator is
 
 ```{math}
-:label: rs4
+:label: eq:rs_operator
 \mathcal{R}_t(U_{t+1}) = \frac{2}{\sigma}
     \log \mathbb{E}\!\left[\exp\!\left(\frac{\sigma U_{t+1}}{2}\right)
     \,\Big|\, y^t\right], \qquad \sigma \leq 0.
@@ -509,12 +521,13 @@ For a candidate quadratic continuation value
 $U_{t+1}^e = -y_{t+1}' \Omega\, y_{t+1} - \rho$, evaluating $\mathcal{R}_t$
 via the log-moment-generating function of the Gaussian distribution yields
 
-$$
+```{math}
+:label: eq:rs_eval
 \mathcal{R}_t U_{t+1}^e
     = -y_t' \hat{A}_t' \mathcal{D}(\Omega)\, \hat{A}_t\, y_t - \hat{\rho}
-$$
+```
 
-where $\mathcal{D}$ is the **same** distortion operator as in the robust problem
+where $\mathcal{D}$ is the **same** distortion operator as in {eq}`eq:distortion_op`
 with $\theta = -\sigma^{-1}$.  Consequently, the risk-sensitive Bellman equation
 has the *same* fixed point $P$ as the robust control problem, and therefore the
 **same decision rule** $u_t = -F y_t$.
@@ -535,7 +548,7 @@ A consumer receives an exogenous endowment process $\{z_t\}$ and allocates it
 between consumption $c_t$ and savings $x_t$ to maximise
 
 ```{math}
-:label: cshort1
+:label: eq:pi_objective
 -\mathbb{E}_0 \sum_{t=0}^{\infty} \beta^t (c_t - b)^2, \qquad \beta \in (0,1)
 ```
 
@@ -544,12 +557,12 @@ of consumption* $\mu_{ct} \equiv b - c_t$ (the control), the budget constraint
 and endowment process are
 
 ```{math}
-:label: cshort2a
+:label: eq:pi_budget
 x_{t+1} = R\, x_t + z_t - b + \mu_{ct}
 ```
 
 ```{math}
-:label: cshort2b
+:label: eq:endowment
 z_{t+1} = \mu_d(1-\rho) + \rho\, z_t + c_d(\epsilon_{t+1} + w_{t+1})
 ```
 
@@ -559,7 +572,8 @@ is an optional shock-mean distortion representing model misspecification.
 Setting $w_{t+1} \equiv 0$ and taking $Q = 0$ (return depends only on the
 control $\mu_{ct}$) and $R_{\text{ctrl}} = 1$ puts this in the standard LQ form
 
-$$
+```{math}
+:label: eq:pi_lq_matrices
 y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix},
 \quad
 A = \begin{bmatrix} R & 1 \\ 0 & \rho \end{bmatrix},
@@ -567,7 +581,7 @@ A = \begin{bmatrix} R & 1 \\ 0 & \rho \end{bmatrix},
 B = \begin{bmatrix} 1 \\ 0 \end{bmatrix},
 \quad
 C = \begin{bmatrix} 0 \\ c_d \end{bmatrix}.
-$$
+```
 
 We calibrate to parameters estimated by Hansen, Sargent, and Tallarini (1999) (HST)
 from post-WWII U.S. data:
@@ -605,7 +619,7 @@ Setting $\sigma = 0$ (no preference for robustness), the consumer's Euler
 equation is
 
 ```{math}
-:label: cshort3
+:label: eq:euler
 \mathbb{E}_t[\mu_{c,t+1}] = (\beta R)^{-1} \mu_{ct}.
 ```
 
@@ -619,7 +633,7 @@ projection onto the one-dimensional direction of $\mu_{ct}$ gives the scalar
 AR(1) representation
 
 ```{math}
-:label: cshort6
+:label: eq:std_ar1
 \mu_{c,t+1} = \varphi\, \mu_{ct} + \nu\, \epsilon_{t+1}.
 ```
 
@@ -649,7 +663,7 @@ uses distorted forecasts $\hat{\mathbb{E}}_t[\cdot]$ evaluated under the
 worst-case model.  The consumption rule takes the certainty-equivalent form
 
 ```{math}
-:label: cshort5r
+:label: eq:robust_consumption
 \mu_{ct} = -(1 - R^{-2}\beta^{-1})
     \!\left(R\, x_t + \hat{\mathbb{E}}_t\!\left[
         \sum_{j=0}^{\infty} R^{-j}(z_{t+j} - b)\right]\right)
@@ -661,14 +675,14 @@ non-robust case.  Only the expectations operator changes.
 The resulting AR(1) dynamics for $\mu_{ct}$ become:
 
 ```{math}
-:label: cshort15
+:label: eq:robust_ar1
 \mu_{c,t+1} = \tilde{\varphi}\, \mu_{ct} + \tilde{\nu}\, \epsilon_{t+1}
 ```
 
 with $\tilde{\varphi} < 1$, implying $\mathbb{E}_t[c_{t+1}] > c_t$ under the
 approximating model — a form of **precautionary saving**.
 
-The observational equivalence formula {eq}`cshort12` (derived below) immediately
+The observational equivalence formula {eq}`eq:oe_locus` (derived below) immediately
 gives the robust AR(1) coefficient: $\tilde{\varphi} = 1/(\tilde{\beta} R)$
 where $\tilde{\beta} = \tilde{\beta}(\sigma)$.  The innovation scale $\tilde{\nu}$
 follows from the robust permanent income formula with the distorted persistence;
@@ -746,7 +760,7 @@ $\tilde{\beta}(\sigma)$, with $\sigma$ set back to zero.
 The equivalence locus is given by
 
 ```{math}
-:label: cshort12
+:label: eq:oe_locus
 \tilde{\beta}(\sigma) =
     \frac{\hat{\beta}(1 + \hat{\beta})}{2(1 + \sigma\alpha^2)}
     \left[1 + \sqrt{1 - \frac{4\hat{\beta}(1+\sigma\alpha^2)}{(1+\hat{\beta})^2}}\right]
@@ -914,7 +928,7 @@ convergence.
 **Observational equivalence verification.**
 
 Choose three pairs $(\sigma_i, \beta_i)$ on the observational equivalence locus
-{eq}`cshort12` (i.e., set $\sigma_i < 0$ and compute the matching $\tilde{\beta}_i$).
+{eq}`eq:oe_locus` (i.e., set $\sigma_i < 0$ and compute the matching $\tilde{\beta}_i$).
 For each pair, solve the corresponding LQ problem and verify that the AR(1)
 coefficient $\varphi$ for $\mu_{ct}$ is the same across all three pairs (to
 numerical precision), while the $P$ matrices differ.

From dbb29d82235c83ca7cf5df0762c54ab05e72ad7e Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 16 Mar 2026 13:02:09 +1100
Subject: [PATCH 03/12] :arrow_up: Bump dawidd6/action-download-artifact from
 16 to 18 (#831)

Bumps [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) from 16 to 18.
- [Release notes](https://github.com/dawidd6/action-download-artifact/releases)
- [Commits](https://github.com/dawidd6/action-download-artifact/compare/v16...v18)

---
updated-dependencies:
- dependency-name: dawidd6/action-download-artifact
  dependency-version: '18'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 .github/workflows/ci.yml      | 2 +-
 .github/workflows/collab.yml  | 2 +-
 .github/workflows/publish.yml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 970e211af..85adbaf2f 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -44,7 +44,7 @@ jobs:
         shell: bash -l {0}
         run: pip list
       - name: Download "build" folder (cache)
-        uses: dawidd6/action-download-artifact@v16
+        uses: dawidd6/action-download-artifact@v18
         with:
           workflow: cache.yml
           branch: main
diff --git a/.github/workflows/collab.yml b/.github/workflows/collab.yml
index 49070566b..0e9297d66 100644
--- a/.github/workflows/collab.yml
+++ b/.github/workflows/collab.yml
@@ -33,7 +33,7 @@ jobs:
         shell: bash -l {0}
         run: pip list
       - name: Download "build" folder (cache)
-        uses: dawidd6/action-download-artifact@v16
+        uses: dawidd6/action-download-artifact@v18
         with:
           workflow: cache.yml
           branch: main
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 48e68efab..5b5ceea99 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -39,7 +39,7 @@ jobs:
         run: pip list
       # Download Build Cache from cache.yml
       - name: Download "build" folder (cache)
-        uses: dawidd6/action-download-artifact@v16
+        uses: dawidd6/action-download-artifact@v18
         with:
           workflow: cache.yml
           branch: main

From cbbb2369ead4f895bd6e34a7d95f23dec0f8cd9f Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Mar 2026 17:38:38 +1100
Subject: [PATCH 04/12] update lecture

---
 lectures/affine_risk_prices.md | 717 ++++++++++++++++++++++-----------
 1 file changed, 491 insertions(+), 226 deletions(-)

diff --git a/lectures/affine_risk_prices.md b/lectures/affine_risk_prices.md
index b769c97c3..e0ed2cf63 100644
--- a/lectures/affine_risk_prices.md
+++ b/lectures/affine_risk_prices.md
@@ -59,14 +59,14 @@ Instead, it
 
 Key applications we study include:
 
-1. **Pricing risky assets** — how risk prices and exposures determine excess returns.
-1. **Affine term structure models** — bond yields as affine functions of a state vector
+1. *Pricing risky assets* — how risk prices and exposures determine excess returns.
+1. *Affine term structure models* — bond yields as affine functions of a state vector
    ({cite:t}`AngPiazzesi2003`).
-1. **Risk-neutral probabilities** — a change-of-measure representation of the pricing equation.
-1. **Distorted beliefs** — reinterpreting risk price estimates when agents hold systematically
+1. *Risk-neutral probabilities* — a change-of-measure representation of the pricing equation.
+1. *Distorted beliefs* — reinterpreting risk price estimates when agents hold systematically
    biased forecasts ({cite:t}`piazzesi2015trend`); see also {doc}`Risk Aversion or Mistaken Beliefs? <risk_aversion_or_mistaken_beliefs>`.
 
-We start with some standard imports:
+We start with the following imports:
 
 ```{code-cell} ipython3
 import numpy as np
@@ -81,7 +81,7 @@ from numpy.linalg import eigvals
 
 The model has two components.
 
-**Component 1** is a vector autoregression that describes the state of the economy
+*Component 1* is a vector autoregression that describes the state of the economy
 and the evolution of the short rate:
 
 ```{math}
@@ -106,7 +106,7 @@ Here
 Equation {eq}`eq_shortrate` says that the **short rate** $r_t$ — the net yield on a
 one-period risk-free claim — is an affine function of the state $z_t$.
 
-**Component 2** is a vector of **risk prices** $\lambda_t$ and an associated stochastic
+*Component 2* is a vector of **risk prices** $\lambda_t$ and an associated stochastic
 discount factor $m_{t+1}$:
 
 ```{math}
@@ -130,6 +130,43 @@ to each risk component affect expected returns (as we show below).
 Because $\lambda_t$ is affine in $z_t$, the stochastic discount factor $m_{t+1}$ is
 **exponential quadratic** in the state $z_t$.
 
+We implement the model components as follows.
+
+```{code-cell} ipython3
+AffineModel = namedtuple('AffineModel',
+    ('μ', 'φ', 'C', 'δ_0', 'δ_1', 'λ_0', 'λ_z', 'm', 'φ_rn', 'μ_rn'))
+
+def create_affine_model(μ, φ, C, δ_0, δ_1, λ_0, λ_z):
+    """Create an affine term structure model."""
+    μ = np.asarray(μ, float)
+    φ = np.asarray(φ, float)
+    C = np.asarray(C, float)
+    δ_1 = np.asarray(δ_1, float)
+    λ_0, λ_z = np.asarray(λ_0, float), np.asarray(λ_z, float)
+    return AffineModel(μ=μ, φ=φ, C=C, δ_0=float(δ_0), δ_1=δ_1,
+                       λ_0=λ_0, λ_z=λ_z, m=len(μ),
+                       φ_rn=φ - C @ λ_z, μ_rn=μ - C @ λ_0)
+
+def simulate(model, z0, T, rng=None):
+    """Simulate z_{t+1} = μ + φ z_t + C ε_{t+1} for T periods."""
+    if rng is None:
+        rng = np.random.default_rng(42)
+    Z = np.zeros((T + 1, model.m))
+    Z[0] = z0
+    for t in range(T):
+        ε = rng.standard_normal(model.m)
+        Z[t + 1] = model.μ + model.φ @ Z[t] + model.C @ ε
+    return Z
+
+def short_rate(model, z):
+    """Compute r_t = δ_0 + δ_1^⊤ z_t."""
+    return model.δ_0 + model.δ_1 @ z
+
+def risk_prices(model, z):
+    """Compute λ_t = λ_0 + λ_z z_t."""
+    return model.λ_0 + model.λ_z @ z
+```
+
 ### Properties of the SDF
 
 Since $\lambda_t^\top\varepsilon_{t+1}$ is conditionally normal, it follows that
@@ -138,12 +175,81 @@ $$
 \mathbb{E}_t(m_{t+1}) = \exp(-r_t)
 $$
 
-and   
+and
 
 $$
-\text{std}_t(m_{t+1}) \approx |\lambda_t|.
+\text{std}_t(m_{t+1}) \approx \| \lambda_t \|.
 $$
 
+```{exercise}
+:label: arp_ex1
+
+Show that the SDF defined in {eq}`eq_sdf` satisfies
+
+$$
+\mathbb{E}_t(m_{t+1}) = \exp(-r_t)
+$$
+
+and
+
+$$
+\text{std}_t(m_{t+1}) \approx \| \lambda_t \|
+$$
+
+where $\| \lambda_t \| = \sqrt{\lambda_t^\top\lambda_t}$ denotes the Euclidean norm of the risk price vector.
+
+For the second result, use the lognormal variance formula and the approximations $\exp(x) \approx 1 + x$ and $\exp(-r_t) \approx 1$ for small $x$ and $r_t$.
+```
+
+```{solution-start} arp_ex1
+:class: dropdown
+```
+
+From {eq}`eq_sdf`, we have
+
+$$
+m_{t+1} = \exp\left(-r_t - \frac{1}{2}\lambda_t^\top\lambda_t - \lambda_t^\top\varepsilon_{t+1}\right)
+$$
+
+
+Since $-\lambda_t^\top \varepsilon_{t+1} \sim \mathcal{N}(0, \lambda_t^\top \lambda_t)$, we have
+$\mathbb{E}_t[\exp(-\lambda_t^\top \varepsilon_{t+1})] = \exp\left(\frac{1}{2}\lambda_t^\top \lambda_t\right)$.
+
+Therefore,
+
+$$
+\mathbb{E}_t(m_{t+1}) = \exp(-r_t - \frac{1}{2}\lambda_t^\top\lambda_t) \mathbb{E}_t[\exp(-\lambda_t^\top\varepsilon_{t+1})] = \exp(-r_t)
+$$
+
+$m_{t+1}$ is conditionally lognormal with $\log m_{t+1} \sim \mathcal{N}(-r_t-\frac{1}{2}\lambda_t^\top\lambda_t, \lambda_t^\top \lambda_t)$. 
+
+By the lognormal variance formula
+$\text{Var}(\exp(X)) = (\exp(\sigma^2) - 1) \exp(2\mu + \sigma^2)$ for $X \sim \mathcal{N}(\mu, \sigma^2)$, we have
+
+$$
+\begin{aligned}
+\text{Var}_t(m_{t+1}) &= (\exp(\lambda_t^\top \lambda_t) - 1) \exp(-2r_t) \\
+&\approx \lambda_t^\top \lambda_t \exp(-2r_t)
+\end{aligned}
+$$
+
+by the approximation $\exp(x) \approx 1 + x$ for small $x$.
+
+Hence,
+
+$$
+\text{std}_t(m_{t+1}) \approx \| \lambda_t \| \exp(-r_t)
+$$
+
+With $\exp(-r_t) \approx 1$ for small $r_t$, we obtain
+
+$$
+\text{std}_t(m_{t+1}) \approx \| \lambda_t \|
+$$
+
+```{solution-end}
+```
+
 The first equation confirms that $r_t$ is the net yield on a risk-free one-period bond.
 
 That is why $r_t$ is called **the short rate** in the exponential quadratic literature.
@@ -191,6 +297,43 @@ formula for the mean of a lognormal random variable gives
 \nu_t(j) = r_t + \alpha_t(j)^\top\lambda_t
 ```
 
+```{exercise}
+:label: arp_ex2
+
+Using the SDF {eq}`eq_sdf` and the return specification {eq}`eq_return`, derive the expected excess return formula {eq}`eq_excess`:
+
+$$
+\nu_t(j) = r_t + \alpha_t(j)^\top\lambda_t
+$$
+
+*Hint:* Start by computing $\log(m_{t+1} R_{j,t+1})$, identify its conditional distribution, and apply the pricing condition $\mathbb{E}_t(m_{t+1}R_{j,t+1}) = 1$.
+```
+
+```{solution-start} arp_ex2
+:class: dropdown
+```
+
+Combining {eq}`eq_sdf` and {eq}`eq_return`, we get
+
+$$
+\log(m_{t+1} R_{j,t+1}) = -r_t + \nu_t(j) - \frac{1}{2}\lambda_t^\top\lambda_t - \frac{1}{2}\alpha_t(j)^\top\alpha_t(j) + (\alpha_t(j) - \lambda_t)^\top\varepsilon_{t+1}
+$$
+
+This is conditionally normal with mean $\mu = -r_t + \nu_t(j) - \frac{1}{2}\lambda_t^\top\lambda_t - \frac{1}{2}\alpha_t(j)^\top\alpha_t(j)$ and variance $\sigma^2 = (\alpha_t(j) - \lambda_t)^\top(\alpha_t(j) - \lambda_t)$.
+
+Since $\mathbb{E}_t[\exp(X)] = \exp(\mu + \frac{1}{2}\sigma^2)$ for $X \sim \mathcal{N}(\mu, \sigma^2)$, the pricing condition $\mathbb{E}_t(m_{t+1}R_{j,t+1}) = 1$ requires $\mu + \frac{1}{2}\sigma^2 = 0$.
+
+Expanding $\frac{1}{2}\sigma^2 = \frac{1}{2}\alpha_t(j)^\top\alpha_t(j) - \alpha_t(j)^\top\lambda_t + \frac{1}{2}\lambda_t^\top\lambda_t$ and adding to $\mu$, the $\frac{1}{2}\lambda_t^\top\lambda_t$ and $\frac{1}{2}\alpha_t(j)^\top\alpha_t(j)$ terms cancel, leaving
+
+$$
+-r_t + \nu_t(j) - \alpha_t(j)^\top\lambda_t = 0
+$$
+
+which gives {eq}`eq_excess`.
+
+```{solution-end}
+```
+
 This is a central result.
 
 It says:
@@ -240,7 +383,7 @@ The recursion {eq}`eq_bondrecur` has an **exponential affine** solution:
 ```{math}
 :label: eq_bondprice
 
-p_t(n) = \exp\!\bigl(\bar A_n + \bar B_n^\top z_t\bigr)
+p_t(n) = \exp \bigl(\bar A_n + \bar B_n^\top z_t\bigr)
 ```
 
 where the scalar $\bar A_n$ and the $m \times 1$ vector $\bar B_n$ satisfy the
@@ -260,9 +403,80 @@ where the scalar $\bar A_n$ and the $m \times 1$ vector $\bar B_n$ satisfy the
 
 with initial conditions $\bar A_1 = -\delta_0$ and $\bar B_1 = -\delta_1$.
 
+```{exercise}
+:label: arp_ex3
+
+Derive the Riccati difference equations {eq}`eq_riccati_A` and {eq}`eq_riccati_B`
+by substituting the conjectured bond price {eq}`eq_bondprice` into the pricing
+recursion {eq}`eq_bondrecur` and matching coefficients.
+
+*Hint:* Substitute $p_{t+1}(n) = \exp(\bar A_n + \bar B_n^\top z_{t+1})$ and
+$\log m_{t+1}$ from {eq}`eq_sdf` into {eq}`eq_bondrecur`.  Use the state
+dynamics {eq}`eq_var` to express $z_{t+1}$ in terms of $z_t$ and
+$\varepsilon_{t+1}$, then evaluate the conditional expectation using the
+lognormal moment generating function.
+```
+
+```{solution-start} arp_ex3
+:class: dropdown
+```
+
+We want to show that if $p_t(n) = \exp(\bar A_n + \bar B_n^\top z_t)$,
+then the recursion $p_t(n+1) = \mathbb{E}_t(m_{t+1}\, p_{t+1}(n))$ yields
+$p_t(n+1) = \exp(\bar A_{n+1} + \bar B_{n+1}^\top z_t)$ with
+$\bar A_{n+1}$ and $\bar B_{n+1}$ given by {eq}`eq_riccati_A` and
+{eq}`eq_riccati_B`.
+
+
+From {eq}`eq_sdf` and {eq}`eq_bondprice`,
+
+$$
+\log(m_{t+1}\, p_{t+1}(n)) = -r_t - \frac{1}{2}\lambda_t^\top\lambda_t - \lambda_t^\top\varepsilon_{t+1} + \bar A_n + \bar B_n^\top z_{t+1}
+$$
+
+Substituting $z_{t+1} = \mu + \phi z_t + C\varepsilon_{t+1}$ from {eq}`eq_var`
+and $r_t = \delta_0 + \delta_1^\top z_t$ from {eq}`eq_shortrate` gives
+
+$$
+\log(m_{t+1}\, p_{t+1}(n)) = \bar A_n + \bar B_n^\top\mu - \delta_0 + (\bar B_n^\top\phi - \delta_1^\top) z_t - \frac{1}{2}\lambda_t^\top\lambda_t + (\bar B_n^\top C - \lambda_t^\top)\varepsilon_{t+1}
+$$
+
+
+Since $\varepsilon_{t+1} \sim \mathcal{N}(0, I)$, and writing the exponent as $a + b^\top\varepsilon_{t+1}$ where
+$b = C^\top \bar B_n - \lambda_t$, we have
+
+$$
+\mathbb{E}_t[\exp(a + b^\top\varepsilon_{t+1})] = \exp\left(a + \frac{1}{2}b^\top b\right)
+$$
+
+Computing $\frac{1}{2}b^\top b$:
+
+$$
+\frac{1}{2}(\bar B_n^\top C - \lambda_t^\top)(\bar B_n^\top C - \lambda_t^\top)^\top = \frac{1}{2}\bar B_n^\top CC^\top \bar B_n - \bar B_n^\top C\lambda_t + \frac{1}{2}\lambda_t^\top\lambda_t
+$$
+
+The $\frac{1}{2}\lambda_t^\top\lambda_t$ cancels with the $-\frac{1}{2}\lambda_t^\top\lambda_t$ already in $a$, and $-\bar B_n^\top C\lambda_t = -\bar B_n^\top C(\lambda_0 + \lambda_z z_t)$.
+
+
+$$
+\log p_t(n+1) = \underbrace{\bar A_n + \bar B_n^\top(\mu - C\lambda_0) + \frac{1}{2}\bar B_n^\top CC^\top \bar B_n - \delta_0}_{\bar A_{n+1}} + \underbrace{(\bar B_n^\top(\phi - C\lambda_z) - \delta_1^\top)}_{\bar B_{n+1}^\top} z_t
+$$
+
+Matching the constant and the coefficient on $z_t$ gives the Riccati
+equations {eq}`eq_riccati_A` and {eq}`eq_riccati_B`.
+
+Setting $n = 0$ with $p_t(1) = \exp(-r_t) = \exp(-\delta_0 - \delta_1^\top z_t)$ gives $\bar A_1 = -\delta_0$ and $\bar B_1 = -\delta_1$.
+
+```{solution-end}
+```
+
 ### Yields
 
-The **yield to maturity** on an $n$-period bond is
+The **yield to maturity** on an $n$-period bond is the constant rate $y$
+at which one would discount the face value to obtain the observed price,
+i.e., $p_t(n) = e^{-n\,y}$.  
+
+Solving for $y$ gives
 
 $$
 y_t(n) = -\frac{\log p_t(n)}{n}
@@ -282,26 +496,10 @@ where $A_n = -\bar A_n / n$ and $B_n = -\bar B_n / n$.
 
 This is the defining property of affine term structure models.
 
-## Python implementation
-
-We now implement the affine term structure model and compute bond prices, yields,
-and risk premiums numerically.
+We now implement the bond pricing formulas {eq}`eq_riccati_A`, {eq}`eq_riccati_B`,
+and {eq}`eq_yield`.
 
 ```{code-cell} ipython3
-AffineModel = namedtuple('AffineModel',
-    ('μ', 'φ', 'C', 'δ_0', 'δ_1', 'λ_0', 'λ_z', 'm', 'φ_rn', 'μ_rn'))
-
-def create_affine_model(μ, φ, C, δ_0, δ_1, λ_0, λ_z):
-    """Create an affine term structure model."""
-    μ = np.asarray(μ, float)
-    φ = np.asarray(φ, float)
-    C = np.asarray(C, float)
-    δ_1 = np.asarray(δ_1, float)
-    λ_0, λ_z = np.asarray(λ_0, float), np.asarray(λ_z, float)
-    return AffineModel(μ=μ, φ=φ, C=C, δ_0=float(δ_0), δ_1=δ_1,
-                       λ_0=λ_0, λ_z=λ_z, m=len(μ),
-                       φ_rn=φ - C @ λ_z, μ_rn=μ - C @ λ_0)
-
 def bond_coefficients(model, n_max):
     """Compute (A_bar_n, B_bar_n) for n = 1, ..., n_max."""
     A_bar = np.zeros(n_max + 1)
@@ -319,39 +517,27 @@ def bond_coefficients(model, n_max):
 def compute_yields(model, z, n_max):
     """Compute yield curve y_t(n) for n = 1, ..., n_max."""
     A_bar, B_bar = bond_coefficients(model, n_max)
-    ns = np.arange(1, n_max + 1)
-    return np.array([(-A_bar[n] - B_bar[n] @ z) / n for n in ns])
+    return np.array([(-A_bar[n] - B_bar[n] @ z) / n
+                     for n in range(1, n_max + 1)])
 
 def bond_prices(model, z, n_max):
     """Compute bond prices p_t(n) for n = 1, ..., n_max."""
     A_bar, B_bar = bond_coefficients(model, n_max)
     return np.array([np.exp(A_bar[n] + B_bar[n] @ z)
                      for n in range(1, n_max + 1)])
-
-def simulate(model, z0, T, rng=None):
-    """Simulate the state process for T periods."""
-    if rng is None:
-        rng = np.random.default_rng(42)
-    Z = np.zeros((T + 1, model.m))
-    Z[0] = z0
-    for t in range(T):
-        ε = rng.standard_normal(model.m)
-        Z[t + 1] = model.μ + model.φ @ Z[t] + model.C @ ε
-    return Z
-
-def short_rate(model, z):
-    """Compute r_t = δ_0 + δ_1^⊤ z_t."""
-    return model.δ_0 + model.δ_1 @ z
-
-def risk_prices(model, z):
-    """Compute λ_t = λ_0 + λ_z z_t."""
-    return model.λ_0 + model.λ_z @ z
 ```
 
 ### A one-factor Gaussian example
 
 To build intuition, we start with a single-factor ($m=1$) Gaussian model.
 
+With $m = 1$, the state $z_t$ follows an AR(1) process
+$z_{t+1} = \mu + \phi z_t + C\varepsilon_{t+1}$.  
+
+The unconditional standard
+deviation of $z_t$ is $\sigma_z = C / \sqrt{1 - \phi^2}$, which determines
+the range of short rates the model generates via $r_t = \delta_0 + \delta_1 z_t$.
+
 ```{code-cell} ipython3
 # One-factor Gaussian model (quarterly)
 μ      = np.array([0.0])
@@ -363,24 +549,11 @@ C      = np.array([[1.0]])
 λ_z    = np.array([[-0.01]])   # countercyclical
 
 model_1f = create_affine_model(μ, φ, C, δ_0, δ_1, λ_0, λ_z)
-
-φ_Q = model_1f.φ_rn[0, 0]
-half_life = np.log(2) / (-np.log(φ[0, 0]))
-σ_z = 1.0 / np.sqrt(1 - φ[0, 0]**2)
-print(f"Physical AR(1):      φ   = {φ[0,0]:.3f}"
-      f"  (half-life {half_life:.1f} quarters)")
-print(f"Risk-neutral AR(1):  φ^Q = {φ_Q:.3f}  "
-      f"({'stable' if abs(φ_Q) < 1 else 'UNSTABLE'})")
-print(f"Unconditional std of z:  σ_z = {σ_z:.2f}")
-r_mean = short_rate(model_1f, np.array([0.0])) * 4 * 100
-print(f"Mean short rate = {r_mean:.1f}% p.a.")
-print(f"Short rate range (±2σ): [{(δ_0-δ_1[0]*2*σ_z)*4*100:.1f}%, "
-      f"{(δ_0+δ_1[0]*2*σ_z)*4*100:.1f}%] p.a.")
 ```
 
 ### Yield curve shapes
 
-We compute yield curves across a range of short-rate states $z_t$.
+We compute yield curves $y_t(n)$ across a range of short-rate states $z_t$.
 
 ```{code-cell} ipython3
 n_max_1f = 60
@@ -396,22 +569,29 @@ r_low = short_rate(model_1f, z_low) * 4 * 100
 r_mid = short_rate(model_1f, z_mid) * 4 * 100
 r_high = short_rate(model_1f, z_high) * 4 * 100
 
-for z, label, color in [
-    (z_low,  f"Low state  (r₁ = {r_low:.1f}%)",
-     "steelblue"),
-    (z_mid,  f"Median state (r₁ = {r_mid:.1f}%)",
-     "seagreen"),
-    (z_high, f"High state (r₁ = {r_high:.1f}%)",
-     "firebrick"),
+for z, label in [
+    (z_low,  f"Low state  ($y_t(1) = ${r_low:.1f}%)"),
+    (z_mid,  f"Median state ($y_t(1) = ${r_mid:.1f}%)"),
+    (z_high, f"High state ($y_t(1) = ${r_high:.1f}%)"),
 ]:
     y = compute_yields(model_1f, z, n_max_1f) * 4 * 100
-    ax.plot(maturities_1f, y, color=color, lw=2.2, label=label)
-    ax.plot(1, y[0], 'o', color=color, ms=7, zorder=5)
+    line, = ax.plot(maturities_1f, y, lw=2.2, label=label)
+    ax.plot(1, y[0], 'o', color=line.get_color(), ms=7, zorder=5)
 
 r_bar = short_rate(model_1f, np.array([0.0])) * 4 * 100
 ax.axhline(r_bar, color='grey', ls=':', lw=1.2, alpha=0.7,
            label=f"Mean short rate ({r_bar:.1f}%)")
 
+# Long-run yield: B_bar_n converges, so y_inf = lim -A_bar_n / n
+φ_Cλ = (model_1f.φ_rn)[0, 0]          # φ - Cλ_z (scalar)
+B_inf = -model_1f.δ_1[0] / (1 - φ_Cλ) # fixed point of B recursion
+A_increment = (B_inf * model_1f.μ_rn[0]
+               + 0.5 * B_inf**2 * (model_1f.C @ model_1f.C.T)[0, 0]
+               - model_1f.δ_0)
+y_inf = -A_increment * 4 * 100         # annualised %
+ax.axhline(y_inf, color='black', ls='--', lw=1.2, alpha=0.7,
+           label=f"Long-run yield ({y_inf:.1f}%)")
+
 ax.set_xlabel("Maturity (quarters)")
 ax.set_ylabel("Yield (% per annum)")
 ax.set_title("Yield Curves — One-Factor Affine Model")
@@ -420,7 +600,7 @@ ax.set_xlim(1, n_max_1f)
 
 ax2 = ax.twiny()
 ax2.set_xlim(ax.get_xlim())
-year_ticks = [4, 8, 12, 20, 28, 40, 60]
+year_ticks = [4, 20, 40, 60]
 ax2.set_xticks(year_ticks)
 ax2.set_xticklabels([f"{t/4:.0f}y" for t in year_ticks])
 ax2.set_xlabel("Maturity (years)")
@@ -429,10 +609,77 @@ plt.tight_layout()
 plt.show()
 ```
 
-The model generates upward-sloping, flat, and inverted yield curves as the short
-rate moves across states — a key qualitative feature of observed bond markets.
+When the short rate is low, the yield curve curve is 
+upward-sloping, while when the short rate is high, it is downward-sloping.
+
+All three curves converge to the same long-run yield $y_\infty$ at long
+maturities, and the long-run yield lies below the mean short rate
+$\delta_0$.
+
+````{exercise}
+:label: arp_ex4
+
+Show that the long-run yield satisfies
+
+```{math}
+:label: eq_y_inf
+
+y_\infty
+  = \delta_0
+  - \bar B_\infty^\top(\mu - C\lambda_0)
+  - \tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty
+```
+
+where $\bar B_\infty = -(I - (\phi - C\lambda_z)^\top)^{-1} \delta_1$
+is the fixed point of the recursion {eq}`eq_riccati_B`.
+
+Then explain why $y_\infty < \delta_0$ under this parameterization.
+
+*Hint:* Use {eq}`eq_yield` and the Riccati equations
+{eq}`eq_riccati_A`--{eq}`eq_riccati_B`.  For the inequality, consider
+each subtracted term separately.
+````
+
+```{solution-start} arp_ex4
+:class: dropdown
+```
+
+
+**Derivation of $y_\infty$.**
+
+The recursion {eq}`eq_riccati_B` is a linear difference equation $\bar B_{n+1} = (\phi - C\lambda_z)^\top \bar B_n - \delta_1$.
+
+When $\phi - C\lambda_z$ has eigenvalues inside the unit circle, $\bar B_n$ converges to $\bar B_\infty = -(I - (\phi - C\lambda_z)^\top)^{-1} \delta_1$.
+
+Since $\bar B_\infty$ is finite, $\bar B_n^\top z_t / n \to 0$ in {eq}`eq_yield`, so $y_t(n) \to \lim_{n\to\infty} -\bar A_n / n$ regardless of $z_t$.
+
+To find this limit, write $\bar A_n = \bar A_1 + \sum_{k=1}^{n-1}(\bar A_{k+1} - \bar A_k)$.
+
+By {eq}`eq_riccati_A`, each increment depends on $\bar B_k$, which converges to $\bar B_\infty$, so the increment converges to $L \equiv \bar B_\infty^\top(\mu - C\lambda_0) + \tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty - \delta_0$.
+
+Therefore $\bar A_n / n \to L$ and $y_\infty = -L$, giving {eq}`eq_y_inf`.
+
+**Why $y_\infty < \delta_0$.**
+
+Both subtracted terms in {eq}`eq_y_inf` are positive.
+
+The quadratic term satisfies $\tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty = \tfrac{1}{2}\|C^\top \bar B_\infty\|^2 \geq 0$ always — a **convexity effect** from Jensen's inequality applied to the exponential bond-price formula.
+
+The linear term $\bar B_\infty^\top(\mu - C\lambda_0)$ is positive because both factors are negative.
+
+$\bar B_\infty < 0$ since $\delta_1 > 0$: a higher state raises the short rate, so bond prices load negatively on the state.
+
+$\mu - C\lambda_0 < 0$ since $\lambda_0 > 0$: positive risk prices shift the risk-neutral drift below the physical drift.
+
+This is a **risk-premium effect**: compensating investors for interest-rate risk lowers the long-run yield.
+
+Together, these two effects push $y_\infty$ below $\delta_0$.
+
+```{solution-end}
+```
+
 
-### Short rate dynamics
+Let's also simulate the short rate path:
 
 ```{code-cell} ipython3
 T = 200
@@ -443,14 +690,14 @@ r_bar_pct = short_rate(model_1f, np.array([0.0])) * 4 * 100
 
 fig, ax = plt.subplots(figsize=(10, 4))
 quarters = np.arange(T + 1)
-ax.plot(quarters, short_rates, color="steelblue", lw=1.3)
-ax.axhline(r_bar_pct, color="red", ls="--", lw=1.3,
+line, = ax.plot(quarters, short_rates, lw=1.3)
+ax.axhline(r_bar_pct, ls="--", lw=1.3,
            label=f"Unconditional mean ({r_bar_pct:.1f}%)")
 ax.fill_between(quarters, short_rates, r_bar_pct,
-                alpha=0.08, color="steelblue")
+                alpha=0.08, color=line.get_color())
 ax.set_xlabel("Quarter")
 ax.set_ylabel("Short rate (% p.a.)")
-ax.set_title("Simulated Short Rate — One-Factor Model (50 years)")
+ax.set_title("Simulated Short Rate")
 ax.set_xlim(0, T)
 ax.legend(fontsize=11)
 plt.tight_layout()
@@ -459,10 +706,45 @@ plt.show()
 
 ### A two-factor model
 
-To match richer yield-curve dynamics, practitioners routinely use $m \geq 2$ factors.
+To match richer yield-curve dynamics, practitioners routinely use $m \geq 2$
+factors.
+
+We now introduce a two-factor specification with state
+$z_t = (z_{1t},\, z_{2t})^\top$, where
+
+$$
+z_{t+1} = \mu + \phi\, z_t + C\,\varepsilon_{t+1},
+\qquad
+\phi = \begin{pmatrix} 0.97 & -0.03 \\ 0 & 0.90 \end{pmatrix},
+\qquad
+C = I_2
+$$
+
+The first factor $z_{1t}$ is highly persistent ($\phi_{11} = 0.97$) and
+drives most of the variation in the short rate through $\delta_1$, so we
+interpret it as a **level** factor.
+
+The second factor $z_{2t}$ mean-reverts faster ($\phi_{22} = 0.90$) and
+affects the short rate with a smaller loading, capturing the **slope**
+of the yield curve.
+
+The off-diagonal entry $\phi_{12} = -0.03$ allows the level factor to
+respond to slope innovations.
+
+The short rate is $r_t = \delta_0 + \delta_1^\top z_t$ with
+$\delta_1 = (0.002,\; 0.001)^\top$, so both factors raise the short
+rate when positive, but the level factor has twice the impact.
+
+Risk prices are $\lambda_t = \lambda_0 + \lambda_z z_t$ with
+$\lambda_z = \text{diag}(-0.005,\, -0.003)$.
+
+The negative diagonal entries mean that risk prices rise when
+the state is low — investors demand higher compensation in bad states.
 
-We now introduce a two-factor specification in which the factors
-can be interpreted as a **level** component and a **slope** component.
+As discussed above, this makes $\phi - C\lambda_z$ have larger
+eigenvalues than $\phi$, so the state is more persistent under the
+risk-neutral measure and the yield curve is more sensitive to the
+current state at long horizons.
 
 ```{code-cell} ipython3
 # Two-factor model: z = [level, slope]
@@ -478,19 +760,8 @@ C_2  = np.eye(2)
 
 model_2f = create_affine_model(μ_2, φ_2, C_2, δ_0_2, δ_1_2, λ_0_2, λ_z_2)
 
-print("Physical measure VAR:")
-print(f"  φ =\n{φ_2}")
-print(f"  eigenvalues of φ: {eigvals(φ_2).real.round(4)}")
-print()
-print("Risk-neutral measure VAR:")
-print(f"  φ^Q = φ - Cλ_z =\n{model_2f.φ_rn.round(4)}")
-eigs_Q = eigvals(model_2f.φ_rn).real
-stable = all(abs(e) < 1 for e in eigs_Q)
-status = "stable" if stable else "UNSTABLE"
-print(f"  eigenvalues of φ^Q: {eigs_Q.round(4)}"
-      f"  ({status})")
-print()
-print("Risk prices make Q dynamics more persistent than P dynamics.")
+print(f"Eigenvalues of φ:       {eigvals(φ_2).real.round(4)}")
+print(f"Eigenvalues of φ - Cλ_z: {eigvals(model_2f.φ_rn).real.round(4)}")
 ```
 
 ```{code-cell} ipython3
@@ -505,17 +776,13 @@ states = {
 
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5))
 
-colors_2f = ["seagreen", "steelblue", "firebrick"]
-for (label, z), color in zip(states.items(), colors_2f):
+for label, z in states.items():
     r_now = short_rate(model_2f, z) * 4 * 100
     y = compute_yields(model_2f, z, n_max_2f) * 4 * 100
-    ax1.plot(maturities_2f, y, lw=2.2, color=color,
-             label=f"{label} (r₁ = {r_now:.1f}%)")
-    ax1.plot(1, y[0], 'o', color=color, ms=7, zorder=5)
+    line, = ax1.plot(maturities_2f, y, lw=2.2,
+                     label=f"{label} (r₁ = {r_now:.1f}%)")
+    ax1.plot(1, y[0], 'o', color=line.get_color(), ms=7, zorder=5)
 
-ax1.annotate("Curves converge as\nmean reversion dominates",
-             xy=(50, 3.8), fontsize=9, color="gray", ha='center',
-             style='italic')
 ax1.set_xlabel("Maturity (quarters)")
 ax1.set_ylabel("Yield (% p.a.)")
 ax1.set_title("Yield Curves — Two-Factor Model")
@@ -526,9 +793,9 @@ A_bar, B_bar = bond_coefficients(model_2f, n_max_2f)
 ns = np.arange(1, n_max_2f + 1)
 B_n = np.array([-B_bar[n] / n for n in ns])
 
-ax2.plot(ns, B_n[:, 0], lw=2.2, color="purple",
+ax2.plot(ns, B_n[:, 0], lw=2.2,
          label=r"Level loading $B_{n,1}$")
-ax2.plot(ns, B_n[:, 1], lw=2.2, color="orange",
+ax2.plot(ns, B_n[:, 1], lw=2.2,
          label=r"Slope loading $B_{n,2}$")
 ax2.axhline(0, color='black', lw=0.6)
 ax2.set_xlabel("Maturity (quarters)")
@@ -536,9 +803,6 @@ ax2.set_ylabel(r"Yield loading $B_{n,k}$")
 ax2.set_title("Factor Loadings on Yields")
 ax2.legend(fontsize=11)
 ax2.set_xlim(1, n_max_2f)
-ax2.annotate("Level factor stays\nimportant at long maturities",
-             xy=(45, B_n[44, 0]), fontsize=9, color="purple",
-             ha='center', va='bottom')
 
 for ax in (ax1, ax2):
     ax_top = ax.twiny()
@@ -584,12 +848,11 @@ z_states_tp = {
 
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5))
 
-tp_colors = ["steelblue", "firebrick"]
-for (label, z), color in zip(z_states_tp.items(), tp_colors):
+for label, z in z_states_tp.items():
     tp = term_premiums(model_2f, z, n_max_tp) * 4 * 100
     r_now = short_rate(model_2f, z) * 4 * 100
     lam = risk_prices(model_2f, z)
-    ax1.plot(maturities_tp, tp, color=color, lw=2.2,
+    ax1.plot(maturities_tp, tp, lw=2.2,
              label=(f"{label}\n  r={r_now:.1f}%,"
                     f" λ=[{lam[0]:.3f}, {lam[1]:.3f}]"))
 
@@ -613,9 +876,9 @@ tp_slope = np.array([-B_bar_d[n, 1] * C_lam[1]
 tp_total = tp_level + tp_slope
 
 ax2.plot(maturities_tp, tp_total, 'k-', lw=2.2, label="Total")
-ax2.plot(maturities_tp, tp_level, color="purple", lw=1.8, ls="--",
+ax2.plot(maturities_tp, tp_level, lw=1.8, ls="--",
          label="Level factor")
-ax2.plot(maturities_tp, tp_slope, color="orange", lw=1.8, ls="--",
+ax2.plot(maturities_tp, tp_slope, lw=1.8, ls="--",
          label="Slope factor")
 ax2.axhline(0, color="black", lw=0.6, ls=":")
 ax2.set_xlabel("Maturity (quarters)")
@@ -637,72 +900,103 @@ plt.show()
 
 ## Risk-neutral probabilities
 
-The stochastic discount factor {eq}`eq_sdf` defines a **change of measure** from the
-physical measure $P$ to the **risk-neutral measure** $Q$.
+We return to the VAR and short-rate equations
+{eq}`eq_var`--{eq}`eq_shortrate`, which for convenience we repeat here:
 
-Define the likelihood ratio
+$$
+z_{t+1} = \mu + \phi z_t + C\varepsilon_{t+1}, \qquad
+r_t = \delta_0 + \delta_1^\top z_t
+$$
+
+where $\varepsilon_{t+1} \sim \mathcal{N}(0, I)$.
+
+We suppose that this structure describes the data-generating mechanism.
+
+Finance economists call this the **physical measure** $P$, to distinguish it
+from the **risk-neutral measure** $Q$ that we now describe.
+
+Under the physical measure, the conditional distribution of $z_{t+1}$ given
+$z_t$ is $\mathcal{N}(\mu + \phi z_t,\; CC^\top)$.
+
+### Change of measure
+
+With the risk-price vector $\lambda_t = \lambda_0 + \lambda_z z_t$ from
+{eq}`eq_riskprices`, define the non-negative random variable
 
 ```{math}
 :label: eq_RN_ratio
 
-\frac{\xi^Q_{t+1}}{\xi^Q_t} = \exp\!\left(-\frac{1}{2}\lambda_t^\top\lambda_t - \lambda_t^\top\varepsilon_{t+1}\right)
+\frac{\xi^Q_{t+1}}{\xi^Q_t}
+  = \exp\!\left(-\tfrac{1}{2}\lambda_t^\top\lambda_t
+                - \lambda_t^\top\varepsilon_{t+1}\right)
 ```
 
-Then
+This is a log-normal random variable with mean 1, so it is a valid
+likelihood ratio that can be used to twist the conditional distribution of
+$z_{t+1}$.
+
+Multiplying the physical conditional distribution by this likelihood ratio
+transforms it into the **risk-neutral conditional distribution**
 
 $$
-m_{t+1} = \frac{\xi^Q_{t+1}}{\xi^Q_t}\exp(-r_t)
+z_{t+1} \mid z_t \;\overset{Q}{\sim}\;
+  \mathcal{N}\!\bigl(\mu - C\lambda_0 + (\phi - C\lambda_z)z_t,\; CC^\top\bigr)
 $$
 
-and the pricing equation $\mathbb{E}^P_t(m_{t+1}R_{j,t+1}) = 1$ becomes
+In other words, under $Q$ the state follows
 
-```{math}
-:label: eq_Qpricing
+$$
+z_{t+1} = (\mu - C\lambda_0) + (\phi - C\lambda_z)\,z_t
+         + C\varepsilon^Q_{t+1}
+$$
 
-\mathbb{E}^Q_t R_{j,t+1} = \exp(r_t)
-```
+where $\varepsilon^Q_{t+1} \sim \mathcal{N}(0, I)$ under $Q$.
+
+The risk-neutral distribution twists the conditional mean from
+$\mu + \phi z_t$ to $\mu - C\lambda_0 + (\phi - C\lambda_z)z_t$.
 
-*Under the risk-neutral measure, expected returns on all assets equal the risk-free return.*
+The adjustments $-C\lambda_0$ (constant) and $-C\lambda_z$
+(state-dependent) encode how the pricing equation
+$\mathbb{E}^P_t m_{t+1} R_{j,t+1} = 1$ adjusts expected returns for
+exposure to the risks $\varepsilon_{t+1}$.
 
-### The risk-neutral VAR
+### Asset pricing in a nutshell
 
-Multiplying the physical conditional distribution of $z_{t+1}$ by the likelihood
-ratio {eq}`eq_RN_ratio` gives the **risk-neutral conditional distribution**
+Using {eq}`eq_RN_ratio`, we can factor the SDF {eq}`eq_sdf` as
 
 $$
-z_{t+1} \mid z_t \;\overset{Q}{\sim}\; \mathcal{N}\!\bigl(\mu - C\lambda_0 + (\phi - C\lambda_z)z_t,\; CC^\top\bigr)
+m_{t+1} = \frac{\xi^Q_{t+1}}{\xi^Q_t}\,\exp(-r_t)
 $$
 
-In other words, under $Q$ the state vector follows
+The pricing condition $\mathbb{E}^P_t(m_{t+1} R_{j,t+1}) = 1$ then becomes
 
-$$
-z_{t+1} = (\mu - C\lambda_0) + (\phi - C\lambda_z)\,z_t + C\varepsilon^Q_{t+1}
-$$
+```{math}
+:label: eq_Qpricing
 
-where $\varepsilon^Q_{t+1} \sim \mathcal{N}(0, I)$ under $Q$.
+\mathbb{E}^Q_t R_{j,t+1} = \exp(r_t)
+```
 
-The risk-neutral drift adjustments $-C\lambda_0$ (constant) and $-C\lambda_z$ (state-dependent)
-encode exactly how the asset pricing formula $\mathbb{E}^P_t m_{t+1}R_{j,t+1}=1$ adjusts
-expected returns for exposure to the risks $\varepsilon_{t+1}$.
+*Under the risk-neutral measure, expected returns on all assets equal
+the risk-free return.*
 
 ### Verification via risk-neutral pricing
 
 Bond prices can be computed by discounting at $r_t$ under $Q$:
 
 $$
-p_t(n) = \mathbb{E}^Q_t\! \left[\exp\!\left(-\sum_{s=0}^{n-1}r_{t+s}\right)\right]
+p_t(n) = \mathbb{E}^Q_t  \left[\exp \left(-\sum_{s=0}^{n-1}r_{t+s}\right)\right]
 $$
 
 We can verify that this agrees with {eq}`eq_bondprice` by iterating the affine
 recursion under the risk-neutral VAR.
 
-Below we confirm this numerically.
+Below we confirm this numerically
 
 ```{code-cell} ipython3
 def bond_price_mc_Q(model, z0, n, n_sims=50_000, rng=None):
     """Estimate p_t(n) by Monte Carlo under Q."""
     if rng is None:
-        rng = np.random.default_rng(2024)
+        rng = np.random.default_rng(0)
     m = len(z0)
     Z = np.tile(z0, (n_sims, 1))
     disc = np.zeros(n_sims)
@@ -715,9 +1009,9 @@ def bond_price_mc_Q(model, z0, n, n_sims=50_000, rng=None):
 z_test = np.array([0.01, 0.005])
 p_analytic = bond_prices(model_2f, z_test, 40)
 
-rng = np.random.default_rng(2024)
+rng = np.random.default_rng(0)
 maturities_check = [4, 12, 24, 40]
-mc_prices = [bond_price_mc_Q(model_2f, z_test, n, n_sims=80_000, rng=rng)
+mc_prices = [bond_price_mc_Q(model_2f, z_test, n, n_sims=100_000, rng=rng)
              for n in maturities_check]
 
 header = (f"{'Maturity':>10}  {'Analytic':>12}"
@@ -735,70 +1029,93 @@ Riccati recursion {eq}`eq_riccati_A`–{eq}`eq_riccati_B`.
 
 ## Distorted beliefs
 
-{cite:t}`piazzesi2015trend` assemble survey
-evidence suggesting that economic experts' forecasts are *systematically biased*
-relative to the physical measure.
+{cite:t}`piazzesi2015trend` assemble survey evidence suggesting that economic
+experts' forecasts are systematically biased relative to the physical measure.
 
 ### The subjective measure
 
-Let $\hat z_{t+1}$ be one-period-ahead expert forecasts.
+Let $\{z_t\}_{t=1}^T$ be a record of observations on the state and let
+$\{\check z_{t+1}\}_{t=1}^T$ be a record of one-period-ahead expert forecasts.
 
-Regressing these on $z_t$:
+Let $\check\mu, \check\phi$ be the regression coefficients in
 
 $$
-\hat z_{t+1} = \hat\mu + \hat\phi\, z_t + e_{t+1}
+\check z_{t+1} = \check\mu + \check\phi\, z_t + e_{t+1}
 $$
 
-yields estimates $\hat\mu, \hat\phi$ that differ from the physical parameters $\mu, \phi$.
+where the residual $e_{t+1}$ has mean zero, is orthogonal to $z_t$, and
+satisfies $\mathbb{E}\,e_{t+1} e_{t+1}^\top = CC^\top$.
+
+By comparing estimates of $\mu, \phi$ from {eq}`eq_var` with estimates of
+$\check\mu, \check\phi$ from the experts' forecasts, {cite:t}`piazzesi2015trend`
+deduce that the experts' beliefs are systematically distorted.
 
-To formalise the distortion, let $\kappa_t = \kappa_0 + \kappa_z z_t$ and define
+To organize this evidence, let $\kappa_t = \kappa_0 + \kappa_z z_t$ and define
 the likelihood ratio
 
 ```{math}
 :label: eq_Srat
 
 \frac{\xi^S_{t+1}}{\xi^S_t}
-= \exp\!\left(-\frac{1}{2}\kappa_t^\top\kappa_t - \kappa_t^\top\varepsilon_{t+1}\right)
+  = \exp\!\left(-\tfrac{1}{2}\kappa_t^\top\kappa_t
+                - \kappa_t^\top\varepsilon_{t+1}\right)
 ```
 
-Multiplying the physical conditional distribution of $z_{t+1}$ by this likelihood
-ratio gives the **subjective (S) conditional distribution**
+This is log-normal with mean 1, so it is a valid likelihood ratio.
+
+Multiplying the physical conditional distribution of $z_{t+1}$ by this
+likelihood ratio transforms it to the experts' **subjective conditional
+distribution**
 
 $$
 z_{t+1} \mid z_t \;\overset{S}{\sim}\;
-\mathcal{N}\!\bigl(\mu - C\kappa_0 + (\phi - C\kappa_z)\,z_t,\; CC^\top\bigr)
+  \mathcal{N}\!\bigl(\mu - C\kappa_0 + (\phi - C\kappa_z)\,z_t,\; CC^\top\bigr)
 $$
 
-Comparing with the regression implies
+In the experts' forecast regression, $\check\mu$ estimates
+$\mu - C\kappa_0$ and $\check\phi$ estimates $\phi - C\kappa_z$.
+
+{cite:t}`piazzesi2015trend` find that the experts behave as if the level and
+slope of the yield curve are more persistent than under the physical measure:
+$\check\phi$ has larger eigenvalues than $\phi$.
+
+### Pricing under distorted beliefs
+
+Suppose a representative agent with subjective beliefs $S$ and true risk
+prices $\lambda^\star_t$ prices assets according to
 
 $$
-\hat\mu = \mu - C\kappa_0, \qquad \hat\phi = \phi - C\kappa_z
+\mathbb{E}^S_t\bigl(m^\star_{t+1}\, R_{j,t+1}\bigr) = 1
 $$
 
-Piazzesi et al. find that experts behave as if the level and slope of the yield
-curve are *more persistent* than under the physical measure: $\hat\phi$ has
-larger eigenvalues than $\phi$.
-
-### Pricing under distorted beliefs
+where $m^\star_{t+1} = \exp(-r_t - \tfrac{1}{2}\lambda_t^{\star\top}\lambda^\star_t
+- \lambda_t^{\star\top}\varepsilon_{t+1})$.
 
-A representative agent with subjective beliefs $S$ and risk prices $\lambda^\star_t$
-satisfies
+Expanding in terms of the physical measure $P$, the subjective pricing
+equation becomes
 
 $$
-\mathbb{E}^S_t\bigl(m^\star_{t+1} R_{j,t+1}\bigr) = 1
+\mathbb{E}^P_t\!\left[
+  \exp\!\left(-r_t
+    - \tfrac{1}{2}(\lambda^\star_t + \kappa_t)^\top(\lambda^\star_t + \kappa_t)
+    - (\lambda^\star_t + \kappa_t)^\top\varepsilon_{t+1}
+  \right) R_{j,t+1}
+\right] = 1
 $$
 
-Expanding this in terms of the physical measure $P$, one finds that the
-**rational-expectations econometrician** who imposes $P$ will estimate risk prices
+Comparing this with the rational-expectations econometrician's pricing
+equation $\mathbb{E}^P_t(m_{t+1}\, R_{j,t+1}) = 1$, we see that what the
+econometrician interprets as $\lambda_t$ is actually
 
 $$
 \hat\lambda_t = \lambda^\star_t + \kappa_t
 $$
 
-That is, the econometrician's estimate conflates true risk prices $\lambda^\star_t$
+The econometrician's estimate conflates true risk prices $\lambda^\star_t$
 and belief distortions $\kappa_t$.
 
-Part of what looks like a high price of risk is actually a systematic forecast bias.
+Part of what looks like a high price of risk is actually a systematic
+forecast bias.
 
 ### Numerical illustration
 
@@ -813,29 +1130,11 @@ Part of what looks like a high price of risk is actually a systematic forecast b
 κ_z = np.linalg.solve(C_2, φ_P - φ_S)
 κ_0 = np.linalg.solve(C_2, μ_P - μ_S)
 
-print("Distortion parameters"
-      " (κ quantifies how experts' beliefs"
-      " differ from P):")
-print(f"  κ_0 = {κ_0.round(4)}")
-print(f"  κ_z =\n{κ_z.round(4)}")
-print()
-print("Eigenvalue comparison:")
-eig_P = sorted(eigvals(φ_P).real, reverse=True)
-eig_S = sorted(eigvals(φ_S).real, reverse=True)
-print(f"  Physical φ eigenvalues:   {[round(e, 4) for e in eig_P]}")
-print(f"  Subjective φ̂ eigenvalues: {[round(e, 4) for e in eig_S]}")
-print("  Experts believe both factors are more persistent.")
-print()
-
 λ_star_0 = np.array([0.03, 0.015])
 λ_star_z = np.array([[-0.006, 0.0], [0.0, -0.004]])
 
 λ_hat_0 = λ_star_0 + κ_0
 λ_hat_z = λ_star_z + κ_z
-
-print("True risk prices:         λ*_0 =", λ_star_0.round(4))
-print("Econometrician estimates: λ̂_0  =", λ_hat_0.round(4))
-print(f"  Belief distortion inflates λ̂_0 by κ_0 = {κ_0.round(4)}.")
 ```
 
 ```{code-cell} ipython3
@@ -844,11 +1143,6 @@ model_true = create_affine_model(
 model_econ = create_affine_model(
     μ_2, φ_2, C_2, δ_0_2, δ_1_2, λ_hat_0, λ_hat_z)
 
-for name, mdl in [("True", model_true), ("Econometrician", model_econ)]:
-    eigs = eigvals(mdl.φ_rn).real
-    status = "stable" if all(abs(e) < 1 for e in eigs) else "UNSTABLE"
-    print(f"{name} model: φ^Q eigenvalues = {eigs.round(4)} ({status})")
-
 z_ref = np.array([0.0, 0.0])
 n_max_db = 60
 maturities_db = np.arange(1, n_max_db + 1)
@@ -858,13 +1152,13 @@ tp_econ = term_premiums(model_econ, z_ref, n_max_db) * 4 * 100
 
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5))
 
-ax1.plot(maturities_db, tp_true, lw=2.2, color="steelblue",
+ax1.plot(maturities_db, tp_true, lw=2.2,
          label=r"True risk prices $\lambda^\star_t$")
-ax1.plot(maturities_db, tp_econ, lw=2.2, color="firebrick", ls="--",
+line_econ, = ax1.plot(maturities_db, tp_econ, lw=2.2, ls="--",
          label=(r"RE econometrician"
                 r" $\hat\lambda_t = \lambda^\star_t + \kappa_t$"))
 ax1.fill_between(maturities_db, tp_true, tp_econ,
-                 alpha=0.15, color="firebrick",
+                 alpha=0.15, color=line_econ.get_color(),
                  label="Belief distortion component")
 ax1.axhline(0, color="black", lw=0.8, ls=":")
 ax1.set_xlabel("Maturity (quarters)")
@@ -877,7 +1171,7 @@ mask = np.abs(tp_true) > 1e-8
 ratio = np.full_like(tp_true, np.nan)
 ratio[mask] = tp_econ[mask] / tp_true[mask]
 
-ax2.plot(maturities_db[mask], ratio[mask], lw=2.2, color="darkred")
+ax2.plot(maturities_db[mask], ratio[mask], lw=2.2)
 ax2.axhline(1, color="black", lw=0.8, ls="--",
             label="No distortion (ratio = 1)")
 ax2.set_xlabel("Maturity (quarters)")
@@ -907,35 +1201,6 @@ data — for example, the survey forecasts used by Piazzesi, Salomao, and Schnei
 Our {doc}`Risk Aversion or Mistaken Beliefs? <risk_aversion_or_mistaken_beliefs>` lecture
 explores this confounding in greater depth.
 
-## The bond price recursion
-
-We verify the exponential affine form {eq}`eq_bondprice` by induction.
-
-**Claim:** If $p_{t+1}(n) = \exp(\bar A_n + \bar B_n^\top z_{t+1})$, then
-$p_t(n+1) = \exp(\bar A_{n+1} + \bar B_{n+1}^\top z_t)$ with $\bar A_{n+1}$ and
-$\bar B_{n+1}$ given by {eq}`eq_riccati_A`–{eq}`eq_riccati_B`.
-
-**Proof sketch.**  Using the SDF {eq}`eq_sdf` and the VAR {eq}`eq_var`:
-
-$$
-\log m_{t+1} + \log p_{t+1}(n)
-= -r_t - \tfrac{1}{2}\lambda_t^\top\lambda_t
-  + (\bar A_n + \bar B_n^\top\mu + \bar B_n^\top\phi z_t)
-  + (-\lambda_t + C^\top\bar B_n)^\top\varepsilon_{t+1}
-$$
-
-Taking the conditional expectation (and using $\varepsilon_{t+1}\sim\mathcal{N}(0,I)$):
-
-$$
-\log p_t(n+1) = -r_t - \tfrac{1}{2}\lambda_t^\top\lambda_t
-  + \bar A_n + \bar B_n^\top(\mu + \phi z_t)
-  + \tfrac{1}{2}(\lambda_t - C^\top\bar B_n)^\top(\lambda_t - C^\top\bar B_n)
-$$
-
-Substituting $r_t = \delta_0 + \delta_1^\top z_t$ and $\lambda_t = \lambda_0 + \lambda_z z_t$,
-collecting constant and linear-in-$z_t$ terms, and equating coefficients gives
-exactly {eq}`eq_riccati_A`–{eq}`eq_riccati_B`. $\blacksquare$
-
 ## Concluding remarks
 
 The affine model of the stochastic discount factor provides a flexible and tractable

From 66a9455934d3c862619e2a8dc51df626f34e8f19 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Mar 2026 18:14:24 +1100
Subject: [PATCH 05/12] updates

---
 lectures/affine_risk_prices.md | 82 ++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/lectures/affine_risk_prices.md b/lectures/affine_risk_prices.md
index e0ed2cf63..1fce64cd8 100644
--- a/lectures/affine_risk_prices.md
+++ b/lectures/affine_risk_prices.md
@@ -962,13 +962,22 @@ exposure to the risks $\varepsilon_{t+1}$.
 
 ### Asset pricing in a nutshell
 
-Using {eq}`eq_RN_ratio`, we can factor the SDF {eq}`eq_sdf` as
+Let $\mathbb{E}^P$ denote an expectation under the physical measure that
+nature uses to generate the data.
+
+Our key asset pricing equation is
+$\mathbb{E}^P_t m_{t+1} R_{j,t+1} = 1$ for all returns $R_{j,t+1}$.
+
+Using {eq}`eq_RN_ratio`, we can express the SDF {eq}`eq_sdf` as
 
 $$
 m_{t+1} = \frac{\xi^Q_{t+1}}{\xi^Q_t}\,\exp(-r_t)
 $$
 
-The pricing condition $\mathbb{E}^P_t(m_{t+1} R_{j,t+1}) = 1$ then becomes
+Then the condition
+$\mathbb{E}^P_t\bigl(\exp(-r_t)\,
+\tfrac{\xi^Q_{t+1}}{\xi^Q_t}\, R_{j,t+1}\bigr) = 1$
+is equivalent to
 
 ```{math}
 :label: eq_Qpricing
@@ -1081,18 +1090,45 @@ $\check\phi$ has larger eigenvalues than $\phi$.
 
 ### Pricing under distorted beliefs
 
-Suppose a representative agent with subjective beliefs $S$ and true risk
-prices $\lambda^\star_t$ prices assets according to
+{cite:t}`piazzesi2015trend` explore the hypothesis that a representative
+agent with these distorted beliefs prices assets and makes returns satisfy
 
 $$
 \mathbb{E}^S_t\bigl(m^\star_{t+1}\, R_{j,t+1}\bigr) = 1
 $$
 
-where $m^\star_{t+1} = \exp(-r_t - \tfrac{1}{2}\lambda_t^{\star\top}\lambda^\star_t
-- \lambda_t^{\star\top}\varepsilon_{t+1})$.
+where $\mathbb{E}^S_t$ is the conditional expectation under the subjective
+$S$ measure and $m^\star_{t+1}$ is the SDF of an agent with these beliefs.
+
+In particular, the agent's SDF is
+
+$$
+m^\star_{t+1} = \exp\!\left(-r^\star_t
+  - \tfrac{1}{2}\lambda_t^{\star\top}\lambda^\star_t
+  - \lambda_t^{\star\top}\varepsilon_{t+1}\right)
+$$
+
+where $r^\star_t$ is the short rate and $\lambda^\star_t$ is the agent's
+vector of risk prices.
+
+Using {eq}`eq_Srat` to convert to the physical measure, the subjective
+pricing equation becomes
+
+$$
+\mathbb{E}^P_t\!\left[
+  \exp\!\left(-r^\star_t
+    - \tfrac{1}{2}\lambda_t^{\star\top}\lambda^\star_t
+    - \lambda_t^{\star\top}\varepsilon_{t+1}
+  \right)
+  \exp\!\left(
+    - \tfrac{1}{2}\kappa_t^\top\kappa_t
+    - \kappa_t^\top\varepsilon_{t+1}
+  \right)
+  R_{j,t+1}
+\right] = 1
+$$
 
-Expanding in terms of the physical measure $P$, the subjective pricing
-equation becomes
+Combining the two exponentials gives
 
 $$
 \mathbb{E}^P_t\!\left[
@@ -1103,19 +1139,26 @@ $$
 \right] = 1
 $$
 
+where $r_t = r^\star_t - \lambda_t^{\star\top}\kappa_t$.
+
 Comparing this with the rational-expectations econometrician's pricing
-equation $\mathbb{E}^P_t(m_{t+1}\, R_{j,t+1}) = 1$, we see that what the
-econometrician interprets as $\lambda_t$ is actually
+equation
 
 $$
-\hat\lambda_t = \lambda^\star_t + \kappa_t
+\mathbb{E}^P_t\!\left[
+  \exp\!\left(-r_t
+    - \tfrac{1}{2}\lambda_t^\top\lambda_t
+    - \lambda_t^\top\varepsilon_{t+1}
+  \right) R_{j,t+1}
+\right] = 1
 $$
 
-The econometrician's estimate conflates true risk prices $\lambda^\star_t$
-and belief distortions $\kappa_t$.
+we see that what the econometrician interprets as $\lambda_t$ is actually
+$\lambda^\star_t + \kappa_t$.
 
-Part of what looks like a high price of risk is actually a systematic
-forecast bias.
+Because the econometrician's estimates partly reflect systematic
+distortions in subjective beliefs, they overstate the representative
+agent's true risk prices $\lambda^\star_t$.
 
 ### Numerical illustration
 
@@ -1191,12 +1234,13 @@ plt.tight_layout()
 plt.show()
 ```
 
-When expert beliefs are overly persistent ($\hat\phi$ has larger eigenvalues than
-$\phi$), the rational-expectations econometrician attributes too much of the
-observed risk premium to risk aversion.
+When expert beliefs are overly persistent ($\check\phi$ has larger eigenvalues
+than $\phi$), the rational-expectations econometrician attributes too much of
+the observed risk premium to risk aversion.
 
 Disentangling belief distortions from genuine risk prices requires additional
-data — for example, the survey forecasts used by Piazzesi, Salomao, and Schneider.
+data — for example, the survey forecasts used by
+{cite:t}`piazzesi2015trend`.
 
 Our {doc}`Risk Aversion or Mistaken Beliefs? <risk_aversion_or_mistaken_beliefs>` lecture
 explores this confounding in greater depth.

From 1c2fb22bc60d0787faf8e0e395932cd74ac95ced Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Mar 2026 22:07:21 +1100
Subject: [PATCH 06/12] updates

---
 lectures/affine_risk_prices.md | 248 ++++++++++++++++++++++++---------
 1 file changed, 179 insertions(+), 69 deletions(-)

diff --git a/lectures/affine_risk_prices.md b/lectures/affine_risk_prices.md
index 1fce64cd8..0c846cdbb 100644
--- a/lectures/affine_risk_prices.md
+++ b/lectures/affine_risk_prices.md
@@ -284,7 +284,7 @@ The components of $\alpha_t(j)$ express the **exposures** of $\log R_{j,t+1}$ to
 corresponding components of the risk vector $\varepsilon_{t+1}$.
 
 The specification {eq}`eq_return` implies $\mathbb{E}_t R_{j,t+1} = \exp(\nu_t(j))$,
-so $\nu_t(j)$ is the expected net log return.
+so $\nu_t(j)$ is the log of the expected gross return.
 
 ### Expected excess returns
 
@@ -338,7 +338,7 @@ This is a central result.
 
 It says:
 
-> The expected net return on asset $j$ equals the short rate plus the inner product
+> The log expected gross return on asset $j$ equals the short rate plus the inner product
 > of the asset's exposure vector $\alpha_t(j)$ with the risk price vector $\lambda_t$.
 
 Each component of $\lambda_t$ prices the corresponding component of $\varepsilon_{t+1}$.
@@ -390,13 +390,13 @@ where the scalar $\bar A_n$ and the $m \times 1$ vector $\bar B_n$ satisfy the
 **Riccati difference equations**
 
 ```{math}
-:label: eq_riccati_A
+:label: eq_riccati_a
 
 \bar A_{n+1} = \bar A_n + \bar B_n^\top(\mu - C\lambda_0) + \frac{1}{2}\bar B_n^\top CC^\top\bar B_n - \delta_0
 ```
 
 ```{math}
-:label: eq_riccati_B
+:label: eq_riccati_b
 
 \bar B_{n+1}^\top = \bar B_n^\top(\phi - C\lambda_z) - \delta_1^\top
 ```
@@ -406,7 +406,7 @@ with initial conditions $\bar A_1 = -\delta_0$ and $\bar B_1 = -\delta_1$.
 ```{exercise}
 :label: arp_ex3
 
-Derive the Riccati difference equations {eq}`eq_riccati_A` and {eq}`eq_riccati_B`
+Derive the Riccati difference equations {eq}`eq_riccati_a` and {eq}`eq_riccati_b`
 by substituting the conjectured bond price {eq}`eq_bondprice` into the pricing
 recursion {eq}`eq_bondrecur` and matching coefficients.
 
@@ -424,8 +424,8 @@ lognormal moment generating function.
 We want to show that if $p_t(n) = \exp(\bar A_n + \bar B_n^\top z_t)$,
 then the recursion $p_t(n+1) = \mathbb{E}_t(m_{t+1}\, p_{t+1}(n))$ yields
 $p_t(n+1) = \exp(\bar A_{n+1} + \bar B_{n+1}^\top z_t)$ with
-$\bar A_{n+1}$ and $\bar B_{n+1}$ given by {eq}`eq_riccati_A` and
-{eq}`eq_riccati_B`.
+$\bar A_{n+1}$ and $\bar B_{n+1}$ given by {eq}`eq_riccati_a` and
+{eq}`eq_riccati_b`.
 
 
 From {eq}`eq_sdf` and {eq}`eq_bondprice`,
@@ -463,7 +463,7 @@ $$
 $$
 
 Matching the constant and the coefficient on $z_t$ gives the Riccati
-equations {eq}`eq_riccati_A` and {eq}`eq_riccati_B`.
+equations {eq}`eq_riccati_a` and {eq}`eq_riccati_b`.
 
 Setting $n = 0$ with $p_t(1) = \exp(-r_t) = \exp(-\delta_0 - \delta_1^\top z_t)$ gives $\bar A_1 = -\delta_0$ and $\bar B_1 = -\delta_1$.
 
@@ -496,7 +496,7 @@ where $A_n = -\bar A_n / n$ and $B_n = -\bar B_n / n$.
 
 This is the defining property of affine term structure models.
 
-We now implement the bond pricing formulas {eq}`eq_riccati_A`, {eq}`eq_riccati_B`,
+We now implement the bond pricing formulas {eq}`eq_riccati_a`, {eq}`eq_riccati_b`,
 and {eq}`eq_yield`.
 
 ```{code-cell} ipython3
@@ -545,8 +545,8 @@ the range of short rates the model generates via $r_t = \delta_0 + \delta_1 z_t$
 C      = np.array([[1.0]])
 δ_0    = 0.01                  # 1%/quarter ≈ 4% p.a.
 δ_1    = np.array([0.001])
-λ_0    = np.array([0.05])
-λ_z    = np.array([[-0.01]])   # countercyclical
+λ_0    = np.array([-0.05])
+λ_z    = np.array([[-0.01]])
 
 model_1f = create_affine_model(μ, φ, C, δ_0, δ_1, λ_0, λ_z)
 ```
@@ -556,6 +556,12 @@ model_1f = create_affine_model(μ, φ, C, δ_0, δ_1, λ_0, λ_z)
 We compute yield curves $y_t(n)$ across a range of short-rate states $z_t$.
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Yield curves under the one-factor affine model
+    name: fig-yield-curves-1f
+---
 n_max_1f = 60
 maturities_1f = np.arange(1, n_max_1f + 1)
 
@@ -594,7 +600,6 @@ ax.axhline(y_inf, color='black', ls='--', lw=1.2, alpha=0.7,
 
 ax.set_xlabel("Maturity (quarters)")
 ax.set_ylabel("Yield (% per annum)")
-ax.set_title("Yield Curves — One-Factor Affine Model")
 ax.legend(fontsize=10, loc='best')
 ax.set_xlim(1, n_max_1f)
 
@@ -609,11 +614,11 @@ plt.tight_layout()
 plt.show()
 ```
 
-When the short rate is low, the yield curve curve is 
+When the short rate is low, the yield curve is
 upward-sloping, while when the short rate is high, it is downward-sloping.
 
 All three curves converge to the same long-run yield $y_\infty$ at long
-maturities, and the long-run yield lies below the mean short rate
+maturities, and the long-run yield lies above the mean short rate
 $\delta_0$.
 
 ````{exercise}
@@ -631,12 +636,12 @@ y_\infty
 ```
 
 where $\bar B_\infty = -(I - (\phi - C\lambda_z)^\top)^{-1} \delta_1$
-is the fixed point of the recursion {eq}`eq_riccati_B`.
+is the fixed point of the recursion {eq}`eq_riccati_b`.
 
-Then explain why $y_\infty < \delta_0$ under this parameterization.
+Then explain why $y_\infty > \delta_0$ under this parameterization.
 
 *Hint:* Use {eq}`eq_yield` and the Riccati equations
-{eq}`eq_riccati_A`--{eq}`eq_riccati_B`.  For the inequality, consider
+{eq}`eq_riccati_a`--{eq}`eq_riccati_b`.  For the inequality, consider
 each subtracted term separately.
 ````
 
@@ -645,9 +650,7 @@ each subtracted term separately.
 ```
 
 
-**Derivation of $y_\infty$.**
-
-The recursion {eq}`eq_riccati_B` is a linear difference equation $\bar B_{n+1} = (\phi - C\lambda_z)^\top \bar B_n - \delta_1$.
+The recursion {eq}`eq_riccati_b` is a linear difference equation $\bar B_{n+1} = (\phi - C\lambda_z)^\top \bar B_n - \delta_1$.
 
 When $\phi - C\lambda_z$ has eigenvalues inside the unit circle, $\bar B_n$ converges to $\bar B_\infty = -(I - (\phi - C\lambda_z)^\top)^{-1} \delta_1$.
 
@@ -655,25 +658,17 @@ Since $\bar B_\infty$ is finite, $\bar B_n^\top z_t / n \to 0$ in {eq}`eq_yield`
 
 To find this limit, write $\bar A_n = \bar A_1 + \sum_{k=1}^{n-1}(\bar A_{k+1} - \bar A_k)$.
 
-By {eq}`eq_riccati_A`, each increment depends on $\bar B_k$, which converges to $\bar B_\infty$, so the increment converges to $L \equiv \bar B_\infty^\top(\mu - C\lambda_0) + \tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty - \delta_0$.
+By {eq}`eq_riccati_a`, each increment depends on $\bar B_k$, which converges to $\bar B_\infty$, so the increment converges to $L \equiv \bar B_\infty^\top(\mu - C\lambda_0) + \tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty - \delta_0$.
 
 Therefore $\bar A_n / n \to L$ and $y_\infty = -L$, giving {eq}`eq_y_inf`.
 
-**Why $y_\infty < \delta_0$.**
-
-Both subtracted terms in {eq}`eq_y_inf` are positive.
-
-The quadratic term satisfies $\tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty = \tfrac{1}{2}\|C^\top \bar B_\infty\|^2 \geq 0$ always — a **convexity effect** from Jensen's inequality applied to the exponential bond-price formula.
+To see why $y_\infty > \delta_0$, note that the two subtracted terms in {eq}`eq_y_inf` have opposite signs under this parameterization.
 
-The linear term $\bar B_\infty^\top(\mu - C\lambda_0)$ is positive because both factors are negative.
+The quadratic term $\tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty = \tfrac{1}{2}\|C^\top \bar B_\infty\|^2 \geq 0$ always — a **convexity effect** from Jensen's inequality that pushes $y_\infty$ below $\delta_0$.
 
-$\bar B_\infty < 0$ since $\delta_1 > 0$: a higher state raises the short rate, so bond prices load negatively on the state.
+The linear term $\bar B_\infty^\top(\mu - C\lambda_0)$ is negative because $\bar B_\infty < 0$ (since $\delta_1 > 0$) while $\mu - C\lambda_0 > 0$ (since $\lambda_0 < 0$).  Subtracting this negative quantity raises $y_\infty$ above $\delta_0$ — a **risk-premium effect**: positive term premiums tilt the average yield curve upward.
 
-$\mu - C\lambda_0 < 0$ since $\lambda_0 > 0$: positive risk prices shift the risk-neutral drift below the physical drift.
-
-This is a **risk-premium effect**: compensating investors for interest-rate risk lowers the long-run yield.
-
-Together, these two effects push $y_\infty$ below $\delta_0$.
+Under this parameterization the risk-premium effect dominates the convexity effect, so $y_\infty > \delta_0$.
 
 ```{solution-end}
 ```
@@ -682,6 +677,12 @@ Together, these two effects push $y_\infty$ below $\delta_0$.
 Let's also simulate the short rate path:
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Simulated short rate path
+    name: fig-simulated-short-rate
+---
 T = 200
 Z = simulate(model_1f, np.array([0.0]), T)
 short_rates = np.array([short_rate(model_1f, Z[t]) * 4 * 100
@@ -697,7 +698,6 @@ ax.fill_between(quarters, short_rates, r_bar_pct,
                 alpha=0.08, color=line.get_color())
 ax.set_xlabel("Quarter")
 ax.set_ylabel("Short rate (% p.a.)")
-ax.set_title("Simulated Short Rate")
 ax.set_xlim(0, T)
 ax.legend(fontsize=11)
 plt.tight_layout()
@@ -729,19 +729,17 @@ affects the short rate with a smaller loading, capturing the **slope**
 of the yield curve.
 
 The off-diagonal entry $\phi_{12} = -0.03$ allows the level factor to
-respond to slope innovations.
+respond to the current slope state $z_{2t}$.
 
 The short rate is $r_t = \delta_0 + \delta_1^\top z_t$ with
 $\delta_1 = (0.002,\; 0.001)^\top$, so both factors raise the short
 rate when positive, but the level factor has twice the impact.
 
 Risk prices are $\lambda_t = \lambda_0 + \lambda_z z_t$ with
+$\lambda_0 = (-0.01,\; -0.005)^\top$ and
 $\lambda_z = \text{diag}(-0.005,\, -0.003)$.
 
-The negative diagonal entries mean that risk prices rise when
-the state is low — investors demand higher compensation in bad states.
-
-As discussed above, this makes $\phi - C\lambda_z$ have larger
+The negative diagonal entries of $\lambda_z$ make $\phi - C\lambda_z$ have larger
 eigenvalues than $\phi$, so the state is more persistent under the
 risk-neutral measure and the yield curve is more sensitive to the
 current state at long horizons.
@@ -754,7 +752,7 @@ current state at long horizons.
 C_2  = np.eye(2)
 δ_0_2 = 0.01
 δ_1_2 = np.array([0.002, 0.001])
-λ_0_2 = np.array([0.01,  0.005])
+λ_0_2 = np.array([-0.01, -0.005])
 λ_z_2 = np.array([[-0.005, 0.0],
                    [ 0.0, -0.003]])
 
@@ -765,6 +763,12 @@ print(f"Eigenvalues of φ - Cλ_z: {eigvals(model_2f.φ_rn).real.round(4)}")
 ```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Yield curves and factor loadings under the two-factor model
+    name: fig-yield-curves-2f
+---
 n_max_2f = 60
 maturities_2f = np.arange(1, n_max_2f + 1)
 
@@ -785,7 +789,6 @@ for label, z in states.items():
 
 ax1.set_xlabel("Maturity (quarters)")
 ax1.set_ylabel("Yield (% p.a.)")
-ax1.set_title("Yield Curves — Two-Factor Model")
 ax1.legend(fontsize=10)
 ax1.set_xlim(1, n_max_2f)
 
@@ -800,7 +803,6 @@ ax2.plot(ns, B_n[:, 1], lw=2.2,
 ax2.axhline(0, color='black', lw=0.6)
 ax2.set_xlabel("Maturity (quarters)")
 ax2.set_ylabel(r"Yield loading $B_{n,k}$")
-ax2.set_title("Factor Loadings on Yields")
 ax2.legend(fontsize=11)
 ax2.set_xlim(1, n_max_2f)
 
@@ -817,25 +819,121 @@ plt.show()
 
 ## Risk premiums
 
-A key object in the affine term structure model is the **term premium** — the extra
-expected return on a long-term bond relative to rolling over short-term bonds.
+A key object in the affine term structure model is the **term premium** — the
+expected excess return on a long-term bond relative to rolling over short-term bonds.
 
-For an $(n+1)$-period bond held for one period, the excess log return is
-approximately
+For an $(n+1)$-period bond held for one period, the shock loading is
+$\alpha_n = C^\top \bar B_n$, so {eq}`eq_excess` gives
 
 $$
-\mathbb{E}_t\left[\log R_{t+1}^{(n+1)}\right] - r_t \;=\; -\bar B_n^\top C \lambda_t
+\log \mathbb{E}_t R_{t+1}^{(n+1)} - r_t \;=\; \bar B_n^\top C \lambda_t
 $$
 
-That is, the term premium equals (minus) the product of the bond's exposure to
-the shocks $(-\bar B_n^\top C)$ with the risk prices $\lambda_t$.
+The term premium equals the inner product of the bond's shock exposure
+$\bar B_n^\top C$ with the risk price vector $\lambda_t$.
+
+To understand the sign of the term premium, note that when $\delta_1 > 0$
+a positive shock $\varepsilon_{t+1}$ raises the short rate and lowers
+long-bond prices, so the bond shock loading
+$\alpha_n = C^\top \bar B_n$ is negative.
+
+A negative $\lambda_0$ then means the stochastic discount factor
+$m_{t+1}$ loads positively on $\varepsilon_{t+1}$, i.e. the SDF is
+high in states where interest rates rise and bond prices fall.
+
+This makes $\text{Cov}(m_{t+1}, R_{t+1}^{(n+1)}) < 0$, so long bonds
+are risky and must offer a positive term premium to compensate
+investors — algebraically, $\bar B_n < 0$ and $C\lambda_0 < 0$ combine
+to give $\bar B_n^\top C \lambda_0 > 0$.
+
+```{exercise}
+:label: arp_ex5
+
+Derive the term premium formula above by computing the one-period holding
+return on an $(n+1)$-period bond and identifying its shock loading.
+
+*Hint:* Use $R_{t+1}^{(n+1)} = p_{t+1}(n)/p_t(n+1)$ with
+$\log p_t(n) = \bar A_n + \bar B_n^\top z_t$, substitute the state
+dynamics {eq}`eq_var`, and apply the Riccati equations
+{eq}`eq_riccati_a`--{eq}`eq_riccati_b` to simplify.
+```
+
+```{solution-start} arp_ex5
+:class: dropdown
+```
+
+The one-period holding return on an $(n+1)$-period bond is
+$R_{t+1}^{(n+1)} = p_{t+1}(n)/p_t(n+1)$, so
+
+$$
+\log R_{t+1}^{(n+1)} = \bar A_n + \bar B_n^\top z_{t+1} - \bar A_{n+1} - \bar B_{n+1}^\top z_t
+$$
+
+Substituting $z_{t+1} = \mu + \phi z_t + C\varepsilon_{t+1}$ from {eq}`eq_var`:
+
+$$
+= \underbrace{(\bar A_n + \bar B_n^\top \mu - \bar A_{n+1})}_{\text{constant}}
+  + \underbrace{(\bar B_n^\top \phi - \bar B_{n+1}^\top)}_{\text{loading on } z_t} z_t
+  + \underbrace{\bar B_n^\top C}_{\text{shock loading}}\, \varepsilon_{t+1}
+$$
+
+We now use the Riccati equations to simplify each piece.
+
+For the constant piece, {eq}`eq_riccati_a` gives
+$\bar A_{n+1} = \bar A_n + \bar B_n^\top(\mu - C\lambda_0) + \tfrac{1}{2}\bar B_n^\top CC^\top \bar B_n - \delta_0$, so
+
+$$
+\bar A_n + \bar B_n^\top \mu - \bar A_{n+1}
+  = \bar B_n^\top C\lambda_0 - \tfrac{1}{2}\bar B_n^\top CC^\top \bar B_n + \delta_0
+$$
+
+For the $z_t$ coefficient, {eq}`eq_riccati_b` gives
+$\bar B_{n+1}^\top = \bar B_n^\top(\phi - C\lambda_z) - \delta_1^\top$, so
+
+$$
+\bar B_n^\top \phi - \bar B_{n+1}^\top = \bar B_n^\top C\lambda_z + \delta_1^\top
+$$
+
+Combining the pieces:
+
+$$
+\log R_{t+1}^{(n+1)}
+  = \underbrace{(\delta_0 + \delta_1^\top z_t)}_{r_t}
+  + \bar B_n^\top C\underbrace{(\lambda_0 + \lambda_z z_t)}_{\lambda_t}
+  - \tfrac{1}{2}\bar B_n^\top CC^\top \bar B_n
+  + \bar B_n^\top C\,\varepsilon_{t+1}
+$$
+
+Writing $\alpha_n = C^\top \bar B_n$, this takes the generic return form {eq}`eq_return`:
+
+$$
+\log R_{t+1}^{(n+1)}
+  = \underbrace{(r_t + \alpha_n^\top \lambda_t)}_{\nu_t}
+  - \tfrac{1}{2}\alpha_n^\top \alpha_n
+  + \alpha_n^\top \varepsilon_{t+1}
+$$
+
+Since $\mathbb{E}_t R_{t+1}^{(n+1)} = \exp(\nu_t)$, we obtain
+
+$$
+\log \mathbb{E}_t R_{t+1}^{(n+1)} - r_t = \alpha_n^\top \lambda_t = \bar B_n^\top C \lambda_t
+$$
+
+```{solution-end}
+```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Term premiums and factor decomposition under the two-factor model
+    name: fig-term-premiums-2f
+---
 def term_premiums(model, z, n_max):
-    """Approximate term premiums for maturities 1 to n_max."""
+    """Compute term premiums for maturities 1 to n_max."""
     A_bar, B_bar = bond_coefficients(model, n_max + 1)
     λ_t = risk_prices(model, z)
-    return np.array([-B_bar[n] @ model.C @ λ_t
+    return np.array([B_bar[n-1] @ model.C @ λ_t
                      for n in range(1, n_max + 1)])
 
 n_max_tp = 60
@@ -859,8 +957,6 @@ for label, z in z_states_tp.items():
 ax1.axhline(0, color="black", lw=0.8, ls="--")
 ax1.set_xlabel("Maturity (quarters)")
 ax1.set_ylabel("Term premium (% p.a.)")
-ax1.set_title("Term Premiums — Two Regimes\n"
-              r"($\lambda_z < 0$: higher premiums when rates are low)")
 ax1.legend(fontsize=9)
 ax1.set_xlim(1, n_max_tp)
 
@@ -869,9 +965,9 @@ A_bar_d, B_bar_d = bond_coefficients(model_2f, n_max_tp + 1)
 λ_t = risk_prices(model_2f, z_decomp)
 C_lam = model_2f.C @ λ_t
 
-tp_level = np.array([-B_bar_d[n, 0] * C_lam[0]
+tp_level = np.array([B_bar_d[n-1, 0] * C_lam[0]
                       for n in range(1, n_max_tp + 1)]) * 4 * 100
-tp_slope = np.array([-B_bar_d[n, 1] * C_lam[1]
+tp_slope = np.array([B_bar_d[n-1, 1] * C_lam[1]
                       for n in range(1, n_max_tp + 1)]) * 4 * 100
 tp_total = tp_level + tp_slope
 
@@ -883,7 +979,6 @@ ax2.plot(maturities_tp, tp_slope, lw=1.8, ls="--",
 ax2.axhline(0, color="black", lw=0.6, ls=":")
 ax2.set_xlabel("Maturity (quarters)")
 ax2.set_ylabel("Term premium (% p.a.)")
-ax2.set_title("Factor Decomposition at z = [0, 0]")
 ax2.legend(fontsize=10)
 ax2.set_xlim(1, n_max_tp)
 
@@ -924,7 +1019,7 @@ With the risk-price vector $\lambda_t = \lambda_0 + \lambda_z z_t$ from
 {eq}`eq_riskprices`, define the non-negative random variable
 
 ```{math}
-:label: eq_RN_ratio
+:label: eq_rn_ratio
 
 \frac{\xi^Q_{t+1}}{\xi^Q_t}
   = \exp\!\left(-\tfrac{1}{2}\lambda_t^\top\lambda_t
@@ -968,7 +1063,7 @@ nature uses to generate the data.
 Our key asset pricing equation is
 $\mathbb{E}^P_t m_{t+1} R_{j,t+1} = 1$ for all returns $R_{j,t+1}$.
 
-Using {eq}`eq_RN_ratio`, we can express the SDF {eq}`eq_sdf` as
+Using {eq}`eq_rn_ratio`, we can express the SDF {eq}`eq_sdf` as
 
 $$
 m_{t+1} = \frac{\xi^Q_{t+1}}{\xi^Q_t}\,\exp(-r_t)
@@ -980,7 +1075,7 @@ $\mathbb{E}^P_t\bigl(\exp(-r_t)\,
 is equivalent to
 
 ```{math}
-:label: eq_Qpricing
+:label: eq_qpricing
 
 \mathbb{E}^Q_t R_{j,t+1} = \exp(r_t)
 ```
@@ -1034,7 +1129,7 @@ for n, mc in zip(maturities_check, mc_prices):
 ```
 
 The analytical and Monte Carlo bond prices agree closely, validating the
-Riccati recursion {eq}`eq_riccati_A`–{eq}`eq_riccati_B`.
+Riccati recursion {eq}`eq_riccati_a`–{eq}`eq_riccati_b`.
 
 ## Distorted beliefs
 
@@ -1063,7 +1158,7 @@ To organize this evidence, let $\kappa_t = \kappa_0 + \kappa_z z_t$ and define
 the likelihood ratio
 
 ```{math}
-:label: eq_Srat
+:label: eq_srat
 
 \frac{\xi^S_{t+1}}{\xi^S_t}
   = \exp\!\left(-\tfrac{1}{2}\kappa_t^\top\kappa_t
@@ -1111,7 +1206,7 @@ $$
 where $r^\star_t$ is the short rate and $\lambda^\star_t$ is the agent's
 vector of risk prices.
 
-Using {eq}`eq_Srat` to convert to the physical measure, the subjective
+Using {eq}`eq_srat` to convert to the physical measure, the subjective
 pricing equation becomes
 
 $$
@@ -1157,10 +1252,21 @@ we see that what the econometrician interprets as $\lambda_t$ is actually
 $\lambda^\star_t + \kappa_t$.
 
 Because the econometrician's estimates partly reflect systematic
-distortions in subjective beliefs, they overstate the representative
-agent's true risk prices $\lambda^\star_t$.
+distortions in subjective beliefs, they can overstate the representative
+agent's true risk prices $\lambda^\star_t$ in this calibration.
 
-### Numerical illustration
+Below we construct a numerical example to illustrate this point.
+
+We start with the two-factor model from above, which we take as the true data-generating process.
+
+We then set the subjective parameters $\check\mu, \check\phi$ to match the evidence in
+{cite:t}`piazzesi2015trend` that experts behave as if the level and slope of the yield curve are more persistent than under the physical measure.
+
+In particular, we use 
+
+$$
+\check\phi = \begin{pmatrix} 0.985 & -0.025 \\ 0.00 & 0.94 \end{pmatrix}
+$$
 
 ```{code-cell} ipython3
 φ_P = φ_2.copy()
@@ -1168,12 +1274,12 @@ agent's true risk prices $\lambda^\star_t$.
 
 # Subjective parameters: experts believe factors are MORE persistent
 φ_S = np.array([[0.985, -0.025], [0.00, 0.94]])
-μ_S = np.array([-0.005, 0.0])
+μ_S = np.array([0.005, 0.0])
 
 κ_z = np.linalg.solve(C_2, φ_P - φ_S)
 κ_0 = np.linalg.solve(C_2, μ_P - μ_S)
 
-λ_star_0 = np.array([0.03, 0.015])
+λ_star_0 = np.array([-0.03, -0.015])
 λ_star_z = np.array([[-0.006, 0.0], [0.0, -0.004]])
 
 λ_hat_0 = λ_star_0 + κ_0
@@ -1181,6 +1287,12 @@ agent's true risk prices $\lambda^\star_t$.
 ```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True vs. distorted-belief term premiums and overstatement ratio
+    name: fig-distorted-beliefs
+---
 model_true = create_affine_model(
     μ_2, φ_2, C_2, δ_0_2, δ_1_2, λ_star_0, λ_star_z)
 model_econ = create_affine_model(
@@ -1206,7 +1318,6 @@ ax1.fill_between(maturities_db, tp_true, tp_econ,
 ax1.axhline(0, color="black", lw=0.8, ls=":")
 ax1.set_xlabel("Maturity (quarters)")
 ax1.set_ylabel("Term premium (% p.a.)")
-ax1.set_title("True vs. Distorted-Belief Term Premiums")
 ax1.legend(fontsize=9.5)
 ax1.set_xlim(1, n_max_db)
 
@@ -1219,7 +1330,6 @@ ax2.axhline(1, color="black", lw=0.8, ls="--",
             label="No distortion (ratio = 1)")
 ax2.set_xlabel("Maturity (quarters)")
 ax2.set_ylabel(r"$\hat{tp}\, /\, tp^\star$")
-ax2.set_title("Overstatement Ratio from Ignoring Belief Bias")
 ax2.legend(fontsize=11)
 ax2.set_xlim(1, n_max_db)
 

From d69949ce0a825d3d723dd0725a9372dbf1268e30 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Mar 2026 22:54:49 +1100
Subject: [PATCH 07/12] updates

---
 lectures/affine_risk_prices.md | 84 ++++++++++++++++++++++------------
 1 file changed, 56 insertions(+), 28 deletions(-)

diff --git a/lectures/affine_risk_prices.md b/lectures/affine_risk_prices.md
index 0c846cdbb..5f07fdac2 100644
--- a/lectures/affine_risk_prices.md
+++ b/lectures/affine_risk_prices.md
@@ -40,7 +40,7 @@ $$
 where $r_t = \rho + \gamma\mu - \frac{1}{2}\sigma_c^2\gamma^2$.
 
 This model asserts that exposure to the random part of aggregate consumption growth,
-$\sigma_c\varepsilon_{t+1}$, is the *only* priced risk — the sole source of discrepancies
+$\sigma_c\varepsilon_{t+1}$, is the *only* priced risk, the sole source of discrepancies
 among expected returns across assets.
 
 Empirical difficulties with this specification (the equity premium puzzle, the
@@ -59,11 +59,11 @@ Instead, it
 
 Key applications we study include:
 
-1. *Pricing risky assets* — how risk prices and exposures determine excess returns.
-1. *Affine term structure models* — bond yields as affine functions of a state vector
+1. *Pricing risky assets*: how risk prices and exposures determine excess returns.
+1. *Affine term structure models*: bond yields as affine functions of a state vector
    ({cite:t}`AngPiazzesi2003`).
-1. *Risk-neutral probabilities* — a change-of-measure representation of the pricing equation.
-1. *Distorted beliefs* — reinterpreting risk price estimates when agents hold systematically
+1. *Risk-neutral probabilities*: a change-of-measure representation of the pricing equation.
+1. *Distorted beliefs*: reinterpreting risk price estimates when agents hold systematically
    biased forecasts ({cite:t}`piazzesi2015trend`); see also {doc}`Risk Aversion or Mistaken Beliefs? <risk_aversion_or_mistaken_beliefs>`.
 
 We start with the following imports:
@@ -103,8 +103,8 @@ Here
 * $\varepsilon_{t+1} \sim \mathcal{N}(0, I)$ is an i.i.d. $m \times 1$ random vector,
 * $z_t$ is an $m \times 1$ state vector.
 
-Equation {eq}`eq_shortrate` says that the **short rate** $r_t$ — the net yield on a
-one-period risk-free claim — is an affine function of the state $z_t$.
+Equation {eq}`eq_shortrate` says that the **short rate** $r_t$, the net yield on a
+one-period risk-free claim, is an affine function of the state $z_t$.
 
 *Component 2* is a vector of **risk prices** $\lambda_t$ and an associated stochastic
 discount factor $m_{t+1}$:
@@ -255,7 +255,7 @@ The first equation confirms that $r_t$ is the net yield on a risk-free one-perio
 That is why $r_t$ is called **the short rate** in the exponential quadratic literature.
 
 The second equation says that the conditional standard deviation of the SDF
-is approximately the magnitude of the vector of risk prices — a measure of overall
+is approximately the magnitude of the vector of risk prices, a measure of overall
 **market price of risk**.
 
 ## Pricing risky assets
@@ -411,7 +411,9 @@ by substituting the conjectured bond price {eq}`eq_bondprice` into the pricing
 recursion {eq}`eq_bondrecur` and matching coefficients.
 
 *Hint:* Substitute $p_{t+1}(n) = \exp(\bar A_n + \bar B_n^\top z_{t+1})$ and
-$\log m_{t+1}$ from {eq}`eq_sdf` into {eq}`eq_bondrecur`.  Use the state
+$\log m_{t+1}$ from {eq}`eq_sdf` into {eq}`eq_bondrecur`.  
+
+Use the state
 dynamics {eq}`eq_var` to express $z_{t+1}$ in terms of $z_t$ and
 $\varepsilon_{t+1}$, then evaluate the conditional expectation using the
 lognormal moment generating function.
@@ -664,9 +666,13 @@ Therefore $\bar A_n / n \to L$ and $y_\infty = -L$, giving {eq}`eq_y_inf`.
 
 To see why $y_\infty > \delta_0$, note that the two subtracted terms in {eq}`eq_y_inf` have opposite signs under this parameterization.
 
-The quadratic term $\tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty = \tfrac{1}{2}\|C^\top \bar B_\infty\|^2 \geq 0$ always — a **convexity effect** from Jensen's inequality that pushes $y_\infty$ below $\delta_0$.
+The quadratic term $\tfrac{1}{2}\bar B_\infty^\top CC^\top \bar B_\infty = \tfrac{1}{2}\|C^\top \bar B_\infty\|^2 \geq 0$ always. 
+
+This is a **convexity effect** from Jensen's inequality that pushes $y_\infty$ below $\delta_0$.
+
+The linear term $\bar B_\infty^\top(\mu - C\lambda_0)$ is negative because $\bar B_\infty < 0$ (since $\delta_1 > 0$) while $\mu - C\lambda_0 > 0$ (since $\lambda_0 < 0$).  Subtracting this negative quantity raises $y_\infty$ above $\delta_0$. 
 
-The linear term $\bar B_\infty^\top(\mu - C\lambda_0)$ is negative because $\bar B_\infty < 0$ (since $\delta_1 > 0$) while $\mu - C\lambda_0 > 0$ (since $\lambda_0 < 0$).  Subtracting this negative quantity raises $y_\infty$ above $\delta_0$ — a **risk-premium effect**: positive term premiums tilt the average yield curve upward.
+This is a **risk-premium effect**: positive term premiums tilt the average yield curve upward.
 
 Under this parameterization the risk-premium effect dominates the convexity effect, so $y_\infty > \delta_0$.
 
@@ -762,6 +768,10 @@ print(f"Eigenvalues of φ:       {eigvals(φ_2).real.round(4)}")
 print(f"Eigenvalues of φ - Cλ_z: {eigvals(model_2f.φ_rn).real.round(4)}")
 ```
 
+This confirms that the eigenvalues of $\phi - C\lambda_z$ are larger than those of $\phi$, so the state is more persistent under the risk-neutral measure.
+
+The following figure shows yield curves across different states of the world, as well as the factor loadings $B_{n,1}$ and $B_{n,2}$ that determine how yields load on the level and slope factors at each maturity
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -817,9 +827,11 @@ plt.tight_layout()
 plt.show()
 ```
 
+We can see that the level factor dominates at long maturities.
+
 ## Risk premiums
 
-A key object in the affine term structure model is the **term premium** — the
+A key object in the affine term structure model is the **term premium**, the
 expected excess return on a long-term bond relative to rolling over short-term bonds.
 
 For an $(n+1)$-period bond held for one period, the shock loading is
@@ -832,19 +844,31 @@ $$
 The term premium equals the inner product of the bond's shock exposure
 $\bar B_n^\top C$ with the risk price vector $\lambda_t$.
 
-To understand the sign of the term premium, note that when $\delta_1 > 0$
-a positive shock $\varepsilon_{t+1}$ raises the short rate and lowers
-long-bond prices, so the bond shock loading
+Because the term premium equals $\bar B_n^\top C \lambda_t$, its sign
+depends on the *current* risk-price vector $\lambda_t$, which is
+state-dependent whenever $\lambda_z \neq 0$.
+
+To see this more concretely, consider a state where $C\lambda_t$ is negative
+componentwise (for example, $z_t = 0$ in our calibration below).
+
+When $\delta_1 > 0$, a positive shock $\varepsilon_{t+1}$ raises the
+short rate and lowers long-bond prices, so the bond shock loading
 $\alpha_n = C^\top \bar B_n$ is negative.
 
-A negative $\lambda_0$ then means the stochastic discount factor
+A negative $C\lambda_t$ then means the stochastic discount factor
 $m_{t+1}$ loads positively on $\varepsilon_{t+1}$, i.e. the SDF is
 high in states where interest rates rise and bond prices fall.
 
-This makes $\text{Cov}(m_{t+1}, R_{t+1}^{(n+1)}) < 0$, so long bonds
-are risky and must offer a positive term premium to compensate
-investors — algebraically, $\bar B_n < 0$ and $C\lambda_0 < 0$ combine
-to give $\bar B_n^\top C \lambda_0 > 0$.
+This makes $\text{Cov}_t(m_{t+1}, R_{t+1}^{(n+1)}) < 0$, so long bonds
+are risky and carry a positive term premium.
+
+Algebraically, $\bar B_n < 0$ and $C\lambda_t < 0$ combine
+to give $\bar B_n^\top C \lambda_t > 0$.
+
+In other states, however, $\lambda_t$ may change sign (e.g. the
+first component flips in the low-rate regime of our two-state
+calibration), and long-bond term premiums can become negative at
+longer maturities.
 
 ```{exercise}
 :label: arp_ex5
@@ -922,6 +946,8 @@ $$
 ```{solution-end}
 ```
 
+The following figure plots term premiums across maturities for different states of the world, as well as the level and slope factor contributions to the term premium in the normal state
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -940,8 +966,8 @@ n_max_tp = 60
 maturities_tp = np.arange(1, n_max_tp + 1)
 
 z_states_tp = {
-    "Low rate (z₁ < 0)":  np.array([-3.0, 2.0]),
-    "High rate (z₁ > 0)": np.array([3.0, -2.0]),
+    "Low rate ($z_1 < 0$)":  np.array([-3.0, 2.0]),
+    "High rate ($z_1 > 0$)": np.array([3.0, -2.0]),
 }
 
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5))
@@ -993,6 +1019,8 @@ plt.tight_layout()
 plt.show()
 ```
 
+We see that the term premium is positive at all maturities in the low-rate state, but becomes negative at longer maturities in the high-rate state.
+
 ## Risk-neutral probabilities
 
 We return to the VAR and short-rate equations
@@ -1257,7 +1285,7 @@ agent's true risk prices $\lambda^\star_t$ in this calibration.
 
 Below we construct a numerical example to illustrate this point.
 
-We start with the two-factor model from above, which we take as the true data-generating process.
+We keep the same physical state dynamics and short-rate specification as above, but choose a separate true risk-price process $(\lambda_t^\star)$ and a distorted-belief econometrician process $(\hat\lambda_t)$ to illustrate the decomposition.
 
 We then set the subjective parameters $\check\mu, \check\phi$ to match the evidence in
 {cite:t}`piazzesi2015trend` that experts behave as if the level and slope of the yield curve are more persistent than under the physical measure.
@@ -1349,7 +1377,7 @@ than $\phi$), the rational-expectations econometrician attributes too much of
 the observed risk premium to risk aversion.
 
 Disentangling belief distortions from genuine risk prices requires additional
-data — for example, the survey forecasts used by
+data, for example, the survey forecasts used by
 {cite:t}`piazzesi2015trend`.
 
 Our {doc}`Risk Aversion or Mistaken Beliefs? <risk_aversion_or_mistaken_beliefs>` lecture
@@ -1362,14 +1390,14 @@ framework for studying asset prices.
 
 Key features are:
 
-1. **Analytical tractability** — Bond prices are exponential affine in $z_t$;
+1. **Analytical tractability:** Bond prices are exponential affine in $z_t$;
    expected returns decompose cleanly into a short rate plus a risk-price×exposure inner product.
-2. **Empirical flexibility** — The free parameters $(\mu, \phi, C, \delta_0, \delta_1, \lambda_0, \lambda_z)$
+2. **Empirical flexibility:** The free parameters $(\mu, \phi, C, \delta_0, \delta_1, \lambda_0, \lambda_z)$
    can be estimated by maximum likelihood (the {doc}`Kalman filter <kalman>` chapter describes
    the relevant methods) without imposing restrictions from a full general equilibrium model.
-3. **Multiple risks** — The vector structure accommodates many sources of risk (monetary
+3. **Multiple risks:** The vector structure accommodates many sources of risk (monetary
    policy, real activity, volatility, etc.).
-4. **Belief distortions** — The framework naturally accommodates non-rational beliefs via
+4. **Belief distortions:** The framework naturally accommodates non-rational beliefs via
    likelihood-ratio twists of the physical measure, as in
    {cite:t}`piazzesi2015trend`.
 

From 7de6d1260a32311a8cd386ff5b920225e9543dac Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Mon, 16 Mar 2026 12:52:43 -0400
Subject: [PATCH 08/12] Tom's March 16 edits of certainty equivalence lectures

---
 lectures/_static/quant-econ.bib |  59 +++++++
 lectures/theil_1.md             | 265 ++++++++++++++++++++++++++++----
 lectures/theil_2.md             | 109 +++++++------
 3 files changed, 360 insertions(+), 73 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 83ea59517..0ba2eddd3 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -246,6 +246,35 @@ @article{Orcutt_Winokur_69
   year      = {1969}
 }
 
+
+@incollection{Hurwicz:1962,
+ address = {Stanford, CA},
+ author = {Hurwicz, Leonid},
+ booktitle = {Logic, Methodology and Philosophy of Science},
+ date-added = {2014-12-26 17:45:57 +0000},
+ date-modified = {2022-01-09 19:40:37 -0600},
+ pages = {232-239},
+ publisher = {Stanford University Press},
+ title = {On the Structural Form of Interdependent Systems},
+ year = {1962}
+}
+
+
+@article{Hurwicz:1966,
+   abstract = {Publisher Summary This chapter concentrates on the structural form of interdependent systems. A great deal of effort is devoted in econometrics and elsewhere to find the behavior pattern of an observed configuration. Such effort is justified on the grounds that the knowledge of the behavior pattern is needed for the purpose of giving explanation or prediction. The merits of this justification are also examined in the chapter. At this point, the chapter considers certain difficulties encountered in the process of looking for the behavior patterns. In certain fields, notably economics (but also— for example, electronic network theory), it deals with a set (configuration) of objects (components) that are interdependent in their behavior. For purposes of both theoretical analysis and empirical investigation of such situations, the phenomena are often described in the chapter (in idealized form) by means of a system of simultaneous equations. History alone is not enabled to determine the behavior pattern of the configuration; but this does not mean that the task is hopeless. The priori information is obtained from the axiom systems or theories that are believed to be relevant to the behavior pattern of the configuration.},
+   author = {Leonid Hurwicz},
+   doi = {http://dx.doi.org/10.1016/S0049-237X(09)70590-7},
+   editor = {Patrick S Ernest Nagel and Alfred Tarski},
+   journal = {Logic, Methodology and Philosophy of Science Proceeding of the 1960 International Congress},
+   pages = {232-239},
+   publisher = {Elsevier},
+   title = {On the Structural Form of Interdependent Systems},
+   volume = {44},
+   url = {http://www.sciencedirect.com/science/article/pii/S0049237X09705907},
+   year = {1966},
+}
+
+
 @article{hurwicz1950least,
   title   = {Least squares bias in time series},
   author  = {Hurwicz, Leonid},
@@ -1977,6 +2006,36 @@ @article{hopenhayn1992entry
   publisher = {JSTOR}
 }
 
+
+@book{bacsar2008h,
+  title={H-infinity optimal control and related minimax design problems: a dynamic game approach},
+  author={Ba{\c{s}}ar, Tamer and Bernhard, Pierre},
+  year={2008},
+  publisher={Springer Science \& Business Media}
+}
+
+@article{sargent1981interpreting,
+  title={Interpreting economic time series},
+  author={Sargent, Thomas J},
+  journal={Journal of political Economy},
+  volume={89},
+  number={2},
+  pages={213--248},
+  year={1981},
+  publisher={The University of Chicago Press}
+}
+
+
+@inproceedings{lucas1976econometric,
+  title={Econometric policy evaluation: A critique},
+  author={Lucas, Robert E Jr},
+  booktitle={Carnegie-Rochester conference series on public policy},
+  volume={1},
+  pages={19--46},
+  year={1976},
+  organization={North-Holland}
+}
+
 @article{HopenhaynRogerson1993,
   author    = {Hopenhayn, Hugo A and Rogerson, Richard},
   journal   = {Journal of Political Economy},
diff --git a/lectures/theil_1.md b/lectures/theil_1.md
index 4ddbf71c2..70208eed5 100644
--- a/lectures/theil_1.md
+++ b/lectures/theil_1.md
@@ -9,7 +9,7 @@ kernelspec:
   name: python3
 ---
 
-(certainty_equiv_robustness)=
+(certainty_equiv_theil1)=
 ```{raw} jupyter
 <div id="qe-notebook-header" align="right" style="text-align:right;">
         <a href="https://quantecon.org/" title="quantecon.org">
@@ -40,12 +40,64 @@ tags: [hide-output]
 !pip install quantecon
 ```
 
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from quantecon import LQ
+```
+
+## Overview 
+
+
+Simon {cite}`simon1956dynamic` and Theil {cite}`theil1957note` established a celebrated
+*certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
+problems.  
+
+Their result justifies a convenient two-step algorithm:
+
+1. **Optimize** under perfect foresight (treat future exogenous variables as known).
+2. **Forecast** — substitute optimal forecasts for the unknown future values.
+
+The striking insight is that these two steps are completely separable. 
+
+The decision rule that emerges from step 1 is *identical* to the decision rule for the original
+stochastic problem once optimal forecasts are substituted in step 2. 
+
+
+The decision rule does not depend on the variance of the shocks, but  the *level* of
+the optimal value function *does*.
+
+After describing the structure of the certainty equivalence property in detail, this lecture describes its role in rational expectations modeling.
+
+We do so  by drawing  heavily on the introduction to {cite}`lucas1981rational`.
+
+In addition to learning the certainty equivalence principle, this lecture describes  troubles with   pre-rational expectations econometric policy evaluation procedures described by {cite}`lucas1976econometric`.
+
+
+```{note}
+That volume that volume collected early  papers on rational expectations modeling and econometrics.
+```
 
-## The Central Problem of Empirical Economics
+## A Central Problem of Empirical Economics
 
-The papers collected in {cite}`lucas1981rational` address a single overarching question: given observations on an agent's behavior in a particular economic environment, what can we infer about how that behavior **would have differed** had the environment been altered? This is the problem of policy-invariant structural inference.
 
-The difficulty is immediate. Observations arise under one environment; we wish to predict behavior under another. Unless we understand *why* the agent behaves as he does—that is, unless we recover the deep objectives that rationalize observed decisions—estimated behavioral relationships are silent on this question.
+To set the stage,  {cite}`lucas1981rational` stated  the central question for empirical economics that had been posed by Leonid Hurwicz ({cite}`Hurwicz:1962`,{cite}`Hurwicz:1966`):
+
+ *  Given observations on an agent's behavior in a particular economic environment, what can we infer about how that behavior **would have differed** had the environment been altered? 
+
+```{note}
+Hurwicz formulates a notion of 'causality' as a context-specific concept that he casts  in terms of a well posed decision problem. 
+```
+ 
+ This is the problem of policy-invariant structural inference in the following setting.  
+
+  * Observations emerged  under one environment or 'regime'
+  * We want to predict behavior under another 'regime'
+  * Unless we understand *why*  agents behaves as they did in the historical regime, i.e., their purposes, we can't predict their behavior under the constraints they face in the new regime. 
+
+To confront the problem that Hurwicz had posed, {cite}`lucas1981rational` formulated
+the following decision framework. 
+ 
 
 ---
 
@@ -56,7 +108,7 @@ Consider a single decision maker whose situation at date $t$ is fully described
 **The environment** $z_t \in S_1$ is selected by "nature" and evolves exogenously according to
 
 ```{math}
-:label: eq:z_transition
+:label: eq:z_transition_v3
 z_{t+1} = f(z_t,\, \epsilon_t),
 ```
 
@@ -65,28 +117,35 @@ where the innovations $\epsilon_t \in \mathcal{E}$ are i.i.d. draws from a fixed
 **The endogenous state** $x_t \in S_2$ is under partial control of the agent. Each period the agent selects an action $u_t \in U$. A fixed technology $g : S_1 \times S_2 \times U \to S_2$ governs the transition
 
 ```{math}
-:label: eq:x_transition
+:label: eq:x_transition_v3
 x_{t+1} = g(z_t,\, x_t,\, u_t).
 ```
 
 **The decision rule** $h : S_1 \times S_2 \to U$ maps the agent's current situation into an action:
 
 ```{math}
-:label: eq:decision_rule
+:label: eq:decision_rule_v3
 u_t = h(z_t,\, x_t).
 ```
 
-The econometrician observes (some or all of) the process $\{z_t, x_t, u_t\}$, the joint motion of which is determined by {eq}`eq:z_transition`, {eq}`eq:x_transition`, and {eq}`eq:decision_rule`.
+The econometrician observes (some or all of) the process $\{z_t, x_t, u_t\}$, the joint motion of which is determined by {eq}`eq:z_transition_v3`, {eq}`eq:x_transition_v3`, and {eq}`eq:decision_rule_v3`.
 
 ---
 
-## The Lucas Critique: Why Estimated Rules Are Not Enough
+##  Estimated Rules Are Not Enough
+
+Suppose we have estimated $f$, $g$, and $h$ from a long time series generated under a fixed environment $f_0$. 
+
+This gives us $h_0 = T(f_0)$, where $T$ is the (unknown) functional mapping environments into optimal decision rules. 
+
+But this single estimate, however precise, **reveals nothing** about how $T(f)$ varies with $f$.
+
+Policy evaluation requires knowledge of the entire map $f \mapsto T(f)$.
 
-Suppose we have estimated $f$, $g$, and $h$ from a long time series generated under a fixed environment $f_0$. This gives us $h_0 = T(f_0)$, where $T$ is the (unknown) functional mapping environments into optimal decision rules. But this single estimate, however precise, **reveals nothing** about how $T(f)$ varies with $f$.
+Under an environment change $f_0 \to f_1$, agents will in general revise their decision rules $h_0 \to h_1 = T(f_1)$, rendering the estimated rule $h_0$ invalid for forecasting behavior under $f_1$.
 
-Policy evaluation requires knowledge of the entire map $f \mapsto T(f)$. Under an environment change $f_0 \to f_1$, agents will in general revise their decision rules $h_0 \to h_1 = T(f_1)$, rendering the estimated rule $h_0$ invalid for forecasting behavior under $f_1$.
 
-The only nonexperimental path forward is to recover the **return function** $V$ from which $h$ is derived as the solution to an optimization problem, and then re-solve that problem under the counterfactual environment $f_1$.
+{cite}`lucas1976econometric` and the introduction to {cite}`lucas1981rational` conclude that the only nonexperimental path forward is to recover the **return function** $V$ from which $h$ is derived as the solution to an optimization problem, and then re-solve that problem under the counterfactual environment $f_1$.
 
 ---
 
@@ -95,13 +154,15 @@ The only nonexperimental path forward is to recover the **return function** $V$
 Assume the agent selects $h$ to maximize the expected discounted sum of current-period returns $V : S_1 \times S_2 \times U \to \mathbb{R}$:
 
 ```{math}
-:label: eq:objective
+:label: eq:objective_v3
 E_0\!\left\{\sum_{t=0}^{\infty} \beta^t\, V(z_t,\, x_t,\, u_t)\right\}, \qquad 0 < \beta < 1,
 ```
 
-given initial conditions $(z_0, x_0)$, the environment $f$, and the technology $g$. Here $E_0\{\cdot\}$ denotes expectation conditional on $(z_0, x_0)$ with respect to the distribution of $\{z_1, z_2, \ldots\}$ induced by {eq}`eq:z_transition`.
+given initial conditions $(z_0, x_0)$, the environment $f$, and the technology $g$. Here $E_0\{\cdot\}$ denotes expectation conditional on $(z_0, x_0)$ with respect to the distribution of $\{z_1, z_2, \ldots\}$ induced by {eq}`eq:z_transition_v3`.
 
-In principle, knowledge of $V$ (together with $g$ and $f$) allows one to compute $h = T(f)$ theoretically and hence to trace out $T(f)$ for any counterfactual $f$. The empirical question is whether $V$ can itself be recovered from observations on $\{f, g, h\}$—a problem of structural identification that, at this level of generality, is formidably difficult.
+In principle, knowledge of $V$ (together with $g$ and $f$) allows one to compute $h = T(f)$ theoretically and hence to trace out $T(f)$ for any counterfactual $f$. 
+
+The essential question is whether $V$ can itself be recovered from observations on $\{f, g, h\}$.
 
 :::{note}
 The decision rule is in general a functional $h = T(f, g, V)$. The dependence on $g$ and $V$ is suppressed in the main text but made explicit when needed.
@@ -109,25 +170,31 @@ The decision rule is in general a functional $h = T(f, g, V)$. The dependence on
 
 ---
 
-## A Linear-Quadratic Specialization and Certainty Equivalence
+## A Linear-Quadratic DP problems  and Certainty Equivalence
+
+Progress beyond the level of generality of the previous section requires restricting the primitives. 
+
+A productive restriction, exploited in the papers collected in {cite}`lucas1981rational`, imposes **quadratic** $V$ and **linear** $g$, which forces $h$ to be linear.
+
+As part of its computational tractability, this specialization delivers a striking structural result:
 
-Progress at the level of generality of Section 4 requires restricting the primitives. The most productive restriction, exploited in the bulk of the volume, imposes **quadratic** $V$ and **linear** $g$, which forces $h$ to be linear. Beyond computational tractability, this specialization delivers a striking structural result: the **certainty equivalence** theorem of Simon {cite}`simon1956dynamic`  and Theil {cite}`theil1957note`. 
+*  the **certainty equivalence** theorem of Simon {cite}`simon1956dynamic`  and Theil {cite}`theil1957note`. 
 
-###  The Composite Decomposition of $h$
+###   Decomposition of $h$
 
 Under quadratic $V$ and linear $g$, the optimal decision rule $h$ decomposes into two components applied in sequence.
 
 **Step 1 — Forecasting.** Define the infinite sequence of optimal point forecasts of all current and future states of nature:
 
 ```{math}
-:label: eq:forecast_sequence
+:label: eq:forecast_sequence_v3
 \tilde{z}_t \;=\; \bigl(z_t,\;\; {}_{t+1}z_t^e,\;\; {}_{t+2}z_t^e,\;\ldots\bigr) \;\in\; S_1^\infty,
 ```
 
 where ${}_{t+j}z_t^e$ denotes the least-mean-squared-error forecast of $z_{t+j}$ formed at time $t$. The optimal forecast sequence is a (generally nonlinear) function of the current state:
 
 ```{math}
-:label: eq:forecast_rule
+:label: eq:forecast_rule_v3
 \tilde{z}_t = h_2(z_t).
 ```
 
@@ -136,33 +203,37 @@ The function $h_2 : S_1 \to S_1^\infty$ depends entirely on the environment $(f,
 **Step 2 — Optimization.** Given the forecast sequence $\tilde{z}_t$, the optimal action is a **linear** function of $\tilde{z}_t$ and $x_t$:
 
 ```{math}
-:label: eq:optimization_rule
+:label: eq:optimization_rule_v3
 u_t = h_1(\tilde{z}_t,\, x_t).
 ```
 
 The function $h_1 : S_1^\infty \times S_2 \to U$ depends entirely on preferences $(V)$ and technology $(g)$ but **not** on the stochastic environment $(f, \Phi)$.
 
-The full decision rule is therefore the **composite**:
+The ultimate decision rule is therefore the **composite**:
 
 ```{math}
-:label: eq:composite_rule
+:label: eq:composite_rule_v3
 \boxed{h(z_t, x_t) \;=\; h_1\!\bigl[h_2(z_t),\; x_t\bigr].}
 ```
 
 ###  The Separation Principle
 
-{eq}`eq:composite_rule` embodies a clean **separation** of the two sources of dependence in $h$:
+{eq}`eq:composite_rule_v3` embodies a clean **separation** of the two sources of dependence in $h$:
 
 | Component | Depends on | Independent of |
 |-----------|-----------|----------------|
 | $h_1$ (optimization) | $V$, $g$ | $f$, $\Phi$ |
 | $h_2$ (forecasting)  | $f$, $\Phi$ | $V$, $g$ |
 
-Since policy analysis concerns changes in $f$, and since $h_1$ is invariant to $f$, the policy analyst need only re-solve the forecasting problem $h_2 = S(f)$ under the new environment, keeping $h_1$ fixed. The relationship of original interest, $h = T(f)$, then follows directly from {eq}`eq:composite_rule`.
+Since policy analysis concerns changes in $f$, and since $h_1$ is invariant to $f$, the policy analyst need only re-solve the forecasting problem $h_2 = S(f)$ under the new environment, keeping $h_1$ fixed.
+
+The relationship of original interest, $h = T(f)$, then follows directly from {eq}`eq:composite_rule_v3`.
 
 ###  Certainty Equivalence and Perfect Foresight
 
-The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** — i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast. The stochasticity of the environment affects actions only through the forecast $\tilde{z}_t$; conditional on $\tilde{z}_t$, the optimization problem is deterministic.
+The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** — i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast. 
+
+Randomness of the environment affects actions only through the forecast $\tilde{z}_t$; conditional on $\tilde{z}_t$, the optimization problem is deterministic.
 
 This means the LQ problem decouples into:
 
@@ -172,21 +243,155 @@ This means the LQ problem decouples into:
 
 ###  Cross-Equation Restrictions
 
-A hallmark of the rational expectations hypothesis as it appears in this framework is that it ties together what would otherwise be free parameters in different equations. The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ — i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ — imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$. These restrictions, rather than any conditions on distributed lags within a single equation, are the operative empirical content of rational expectations.
+A hallmark of the rational expectations hypothesis as it appears in this framework is that it ties together what would otherwise be free parameters in different equations.
+
+The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ — i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ — imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$. 
+
+These restrictions, rather than any conditions on distributed lags within a single equation, are the operative empirical content of rational expectations.
+
+```{note}
+This is the message of {cite}`lucas1976econometric` and  {cite}`sargent1981interpreting`. 
+```
+
+### Python: Demonstrating Certainty Equivalence
+
+The following code verifies the CE principle numerically.
+
+We consider a simple scalar LQ problem:
+
+$$y_{t+1} = a\, y_t + b\, u_t + \sigma\, \varepsilon_{t+1}, \qquad r(y_t, u_t) = -(q\, y_t^2 + r\, u_t^2)$$
+
+and vary the noise standard deviation $\sigma$ across a wide range.
+
+The CE theorem predicts that:
+
+* the **policy gain** $F$ (the coefficient in $u_t = -F y_t$) is independent of $\sigma$, and
+* the **value constant** $d$ (the additive term in $V(y) = -y' P y - d$) grows with $\sigma$.
+
+```{code-cell} ipython3
+# ── Simple 1-D scalar LQ problem ───────────────────────────────────────────
+# y_{t+1} = a·y_t + b·u_t + σ·ε_{t+1},   r = −(q·y² + r·u²)
+
+a, b_coeff = 0.9, 1.0
+q_state, r_ctrl = 1.0, 1.0
+beta = 0.95
+
+A = np.array([[a]])
+B = np.array([[b_coeff]])
+Q_mat = np.array([[q_state]])
+R_mat = np.array([[r_ctrl]])
+
+sigma_vals = np.linspace(0.0, 3.0, 80)
+F_vals, d_vals = [], []
+
+for sigma in sigma_vals:
+    C = np.array([[sigma]])
+    lq = LQ(Q_mat, R_mat, A, B, C=C, beta=beta)
+    P, F, d = lq.stationary_values()
+    F_vals.append(float(F[0, 0]))
+    d_vals.append(float(d))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+axes[0].plot(sigma_vals, F_vals, lw=2)
+axes[0].set_xlabel('Noise level $\\sigma$')
+axes[0].set_ylabel('Policy gain $F$')
+axes[0].set_title('CE: Policy does not depend on noise')
+axes[0].set_ylim(0, 2 * max(F_vals) + 0.1)
+
+axes[1].plot(sigma_vals, d_vals, lw=2, color='darkorange')
+axes[1].set_xlabel('Noise level $\\sigma$')
+axes[1].set_ylabel('Value constant $d$')
+axes[1].set_title('Noise lowers value but not the decision rule')
+
+plt.tight_layout()
+plt.show()
+```
+
+As the plot confirms, $F$ (the policy gain) is **flat** across all noise levels,
+while the value constant $d$ increases monotonically with $\sigma$.
+
+This is the CE principle in action: **uncertainty changes the value of the problem but not the optimal decision rule**.
 
 ---
 
 ##  A Trouble with  Ad Hoc Expectations 
 
-Prior practice, exemplified by the adaptive expectations mechanisms of Friedman {cite}`Friedman1956` and Cagan {cite}`Cagan`, directly postulated a particular form of {eq}`eq:forecast_rule`:
+Prior practice, exemplified by the adaptive expectations mechanisms of Friedman {cite}`Friedman1956` and Cagan {cite}`Cagan`, directly postulated a particular form of {eq}`eq:forecast_rule_v3`:
 
 ```{math}
-:label: eq:adaptive_expectations
+:label: eq:adaptive_expectations_v3
 \theta_t^e = \lambda \sum_{i=0}^{\infty} (1-\lambda)^i\, \theta_{t-i}, \qquad 0 < \lambda < 1,
 ```
 
 treating the coefficient $\lambda$ as a free parameter to be estimated from data, with no reference to the underlying environment $f$.
 
-The deficiency is not that {eq}`eq:adaptive_expectations` is a distributed lag — linear forecasting rules are perfectly acceptable simplifications. The deficiency is that the **coefficients** of the distributed lag are left unrestricted by theory. The mapping $h_2 = S(f)$ shows that optimal forecasting coefficients are *determined* by $f$: when $f$ changes, $h_2$ changes, and so does $h$. An estimated $\lambda$ calibrated under $f_0$ is therefore non-structural and will give incorrect predictions whenever $f$ is altered. This is the econometric content of the critique that Muth's paper delivers.
+The deficiency is not that {eq}`eq:adaptive_expectations_v3` is a distributed lag — linear forecasting rules are perfectly acceptable simplifications. The deficiency is that the **coefficients** of the distributed lag are left unrestricted by theory. The mapping $h_2 = S(f)$ shows that optimal forecasting coefficients are *determined* by $f$: when $f$ changes, $h_2$ changes, and so does $h$. An estimated $\lambda$ calibrated under $f_0$ is therefore non-structural and will give incorrect predictions whenever $f$ is altered. This is the econometric content of the critique that Muth's paper delivers.
 
 Rational expectations equates the subjective distribution that agents use in forming $\tilde{z}_t$ to the objective distribution $f$ that actually generates the data, thereby closing the model and eliminating free parameters in $h_2$.
+
+---
+
+## Exercises
+
+```{exercise-start}
+:label: theil1_ex1
+```
+
+**CE and noise variance.**
+
+Using the scalar LQ setup in the code cell above (with $a = 0.9$, $b = 1$,
+$q = r = 1$, $\beta = 0.95$), verify numerically that the value constant $d$
+satisfies $d \propto \sigma^2$.
+
+*Hint:* From the CE analysis, the value constant satisfies
+$d = \tfrac{\beta}{1-\beta}\,\mathrm{tr}(C' P C)$,
+and since $C = \sigma$ in the scalar case, this gives
+$d = \tfrac{\beta}{1-\beta}\, P\, \sigma^2$.
+Confirm that a plot of $d$ against $\sigma^2$ is linear and compute the theoretical
+slope $\tfrac{\beta}{1-\beta} P$.
+
+```{exercise-end}
+```
+
+```{solution-start} theil1_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+# Reuse F_vals and d_vals already computed above
+sigma_sq_vals = sigma_vals ** 2
+
+fig, ax = plt.subplots(figsize=(8, 5))
+ax.plot(sigma_sq_vals, d_vals, lw=2)
+ax.set_xlabel('$\\sigma^2$')
+ax.set_ylabel('Value constant $d$')
+ax.set_title('Value constant is linear in noise variance (CE principle)')
+
+# Overlay linear fit
+coeffs = np.polyfit(sigma_sq_vals, d_vals, 1)
+ax.plot(sigma_sq_vals, np.polyval(coeffs, sigma_sq_vals),
+        'r--', lw=1.5, label=f'Linear fit: slope = {coeffs[0]:.3f}')
+ax.legend()
+plt.tight_layout()
+plt.show()
+
+# Theoretical slope: β/(1−β) × P
+P_scalar = float(LQ(Q_mat, R_mat, A, B, C=np.zeros((1, 1)),
+                    beta=beta).stationary_values()[0])
+theoretical_slope = beta / (1 - beta) * P_scalar
+print(f"Empirical slope:    {coeffs[0]:.4f}")
+print(f"Theoretical slope β/(1−β)·P = {theoretical_slope:.4f}")
+```
+
+The slope is indeed $\tfrac{\beta}{1-\beta} P$, confirming the analytic formula.
+The policy matrix $P$ is determined entirely by preferences and technology, not by the
+noise level — a direct consequence of the certainty equivalence principle.
+
+```{solution-end}
+```
+
+## Concluding remarks
+
+This sequel  {doc}`certainty equivalence and model uncertainty <theil_2>` describes how to extend the certainty equivalence principle to
+linear-quadratic setting in which a decision distrusts the transition dynamics specified in his baseline model.
diff --git a/lectures/theil_2.md b/lectures/theil_2.md
index dd858dc7a..a86a174b9 100644
--- a/lectures/theil_2.md
+++ b/lectures/theil_2.md
@@ -30,37 +30,21 @@ kernelspec:
 :depth: 2
 ```
 
-This lecture draws on {cite}`hansen2004certainty` and  {cite}`HansenSargent2008`.
-
-In addition to what's in Anaconda, this lecture will need the following libraries:
-
-```{code-cell} ipython3
----
-tags: [hide-output]
----
-!pip install quantecon
-```
-
 
 
 ## Overview
 
 
+This is a sequel to {doc}`this lecture on certainty equivalence <theil_1>` that described 
+established an important *certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
+problems.  
 
-Simon {cite}`simon1956dynamic` and Theil {cite}`theil1957note` established a celebrated
-*certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
-problems.  Their result justifies a convenient two-step algorithm:
+The property justifies  a  two-step algorithm for computing optimal decision rules:
 
 1. **Optimize** under perfect foresight (treat future exogenous variables as known).
 2. **Forecast** — substitute optimal forecasts for the unknown future values.
 
-The striking insight is that these two steps are completely separable.  The decision
-rule that emerges from step 1 is *identical* to the decision rule for the original
-stochastic problem once optimal forecasts are substituted in step 2.  In particular,
-the decision rule does not depend on the variance of the shocks — only the *level* of
-the optimal value function does.
-
-This lecture extends the classical result in two directions motivated by
+This lecture extends the certainty equivalence property  in two directions motivated by
 {cite}`hansen2004certainty`:
 
 - **Model uncertainty and robustness.** What happens when the decision maker does not
@@ -80,10 +64,23 @@ application — with Python code using `quantecon`.
 
 * Linear transition laws and quadratic objectives (LQ framework).
 * Ordinary CE: optimal policy independent of noise variance.
-* Robust CE: distorted forecasts replace rational forecasts; policy changes with $\theta$.
+* Robust CE: distorted forecasts replace model baseline model forecasts; policy funciton depends on  $\theta$.
 * Permanent income application: Hall's martingale, precautionary savings under robustness,
   and observational equivalence between robustness and patience.
 
+
+This lecture draws on {cite}`hansen2004certainty` and  {cite}`HansenSargent2008`.
+
+In addition to what's in Anaconda, this lecture will need the following libraries:
+
+```{code-cell} ipython3
+---
+tags: [hide-output]
+---
+!pip install quantecon
+```
+
+
 We begin with imports:
 
 ```{code-cell} ipython3
@@ -114,6 +111,7 @@ z_{t+1} = f(z_t,\, \epsilon_{t+1})
 ```
 
 and $\epsilon_{t+1}$ is an i.i.d. sequence with c.d.f. $\Phi$.
+
 The *endogenous* component $x_t$ obeys
 
 ```{math}
@@ -132,7 +130,9 @@ The decision maker maximises the discounted expected return
 ```
 
 choosing a control $u_t$ measurable with respect to the history $y^t \equiv
-(x^t, z^t)$.  The solution is a stationary decision rule
+(x^t, z^t)$.  
+
+The maximizer is a stationary decision rule
 
 ```{math}
 :label: eq:stationary_rule_o 
@@ -152,7 +152,9 @@ steps.
 
 **Step 1 — Perfect-foresight control.**  Solve the *nonstochastic* problem of
 maximising {eq}`eq:objective_o` subject to {eq}`eq:x_transition_o`, treating the future sequence
-$\mathbf{z}_t = (z_t, z_{t+1}, \ldots)$ as known.  The solution is the
+$\mathbf{z}_t = (z_t, z_{t+1}, \ldots)$ as known.  
+
+The solution is the
 *feedback-feedforward* rule
 
 ```{math}
@@ -211,13 +213,16 @@ Two key observations follow from the separation:
 
 An equivalent statement: the same decision rule $h$ emerges from the *nonstochastic*
 version of the problem obtained by setting all shocks to zero,
-$z_{t+1} = f_1 z_t$.  The presence of uncertainty *lowers the value* (larger $p$)
+$z_{t+1} = f_1 z_t$.  
+
+The presence of uncertainty *lowers the value* (larger $p$)
 but does not alter *behaviour*.
 
 ### Python: Demonstrating Certainty Equivalence
 
-The following code verifies the CE principle numerically.  We consider a simple
-scalar LQ problem and vary the noise standard deviation $\sigma$.
+The following code verifies the CE principle numerically. 
+
+We consider a simple scalar LQ problem and vary the noise standard deviation $\sigma$.
 
 ```{code-cell} ipython3
 # ── Simple 1-D scalar LQ problem ───────────────────────────────────────────
@@ -260,8 +265,9 @@ plt.show()
 ```
 
 As the plot confirms, $F$ (the policy gain) is **flat** across all noise levels,
-while the value constant $d$ increases monotonically with $\sigma$.  This is the
-CE principle in action.
+while the value constant $d$ increases monotonically with $\sigma$. 
+
+This is the CE principle in action.
 
 ---
 
@@ -270,7 +276,9 @@ CE principle in action.
 ### Setup and the Multiplier Problem
 
 The decision maker in Simon and Theil's setting knows his model exactly — he has
-no doubt about the transition law {eq}`eq:z_transition`.  Now suppose he suspects that the true
+no doubt about the transition law {eq}`eq:z_transition`.  
+
+Now suppose he suspects that the true
 data-generating process is
 
 ```{math}
@@ -279,7 +287,9 @@ z_{t+1} = f(z_t,\; \epsilon_{t+1} + w_{t+1})
 ```
 
 where $w_{t+1} = \omega_t(x^t, z^t)$ is a misspecification term chosen by an
-adversarial "nature."  The decision maker believes his approximating model is a
+adversarial "nature." 
+
+The decision maker believes his approximating model is a
 good approximation in the sense that
 
 ```{math}
@@ -302,7 +312,9 @@ To construct a *robust* decision rule the decision maker solves the
     \Big|\, y_0\right]
 ```
 
-where $\theta > 0$ penalises large distortions.  A larger $\theta$ shrinks the
+where $\theta > 0$ penalises large distortions. 
+
+A larger $\theta$ shrinks the
 feasible misspecification set; as $\theta \to \infty$ the problem reduces to
 ordinary LQ.
 
@@ -316,6 +328,7 @@ The Markov perfect equilibrium *conceals* a form of CE.  To reveal it, Hansen an
 Sargent {cite}`HansenSargent2001` impose a **Stackelberg timing protocol**: at
 time 0, the *minimising* player commits once and for all to a plan
 $\{w_{t+1}\}$, after which the *maximising* player chooses $u_t$ sequentially.
+
 This makes the minimiser the Stackelberg leader.
 
 To describe the leader's committed plan, introduce "big-letter" state variables
@@ -340,14 +353,16 @@ Y_{t+1} = M Y_t + N \epsilon_{t+1}, \qquad w_{t+1} = W(Y_t).
 
 The maximising player then faces an *ordinary* dynamic programming problem subject
 to his own dynamics {eq}`eq:x_transition`, the distorted $z$-law {eq}`eq:distorted_law`, and the exogenous
-process {eq}`eq:stackelberg_law`.  His optimal rule takes the form
+process {eq}`eq:stackelberg_law`. 
+
+His optimal rule takes the form
 
 ```{math}
 :label: eq:max_rule
 u_t = \tilde{H}(x_t, z_t, Y_t).
 ```
 
-Başar and Bernhard (1995) and Hansen and Sargent (2004) establish that at
+{cite}`bacsar2008h` and {cite}`hansen2008robustness`  establish that at
 equilibrium (with "big $K$ = little $k$" imposed) this collapses to
 
 ```{math}
@@ -552,7 +567,9 @@ between consumption $c_t$ and savings $x_t$ to maximise
 -\mathbb{E}_0 \sum_{t=0}^{\infty} \beta^t (c_t - b)^2, \qquad \beta \in (0,1)
 ```
 
-where $b$ is a bliss level of consumption.  Defining the *marginal utility
+where $b$ is a bliss level of consumption. 
+
+Defining the *marginal utility
 of consumption* $\mu_{ct} \equiv b - c_t$ (the control), the budget constraint
 and endowment process are
 
@@ -583,7 +600,7 @@ B = \begin{bmatrix} 1 \\ 0 \end{bmatrix},
 C = \begin{bmatrix} 0 \\ c_d \end{bmatrix}.
 ```
 
-We calibrate to parameters estimated by Hansen, Sargent, and Tallarini (1999) (HST)
+We calibrate to parameters estimated by Hansen, Sargent, and Tallarini (1999) {cite}`HST_1999`
 from post-WWII U.S. data:
 
 ```{code-cell} ipython3
@@ -684,9 +701,11 @@ approximating model — a form of **precautionary saving**.
 
 The observational equivalence formula {eq}`eq:oe_locus` (derived below) immediately
 gives the robust AR(1) coefficient: $\tilde{\varphi} = 1/(\tilde{\beta} R)$
-where $\tilde{\beta} = \tilde{\beta}(\sigma)$.  The innovation scale $\tilde{\nu}$
+where $\tilde{\beta} = \tilde{\beta}(\sigma)$.
+
+The innovation scale $\tilde{\nu}$
 follows from the robust permanent income formula with the distorted persistence;
-Hansen and Sargent (2001) report $\tilde{\nu} \approx 8.0473$ for the HST
+{cite}`HST_1999` report $\tilde{\nu} \approx 8.0473$ for their
 calibration.
 
 ```{code-cell} ipython3
@@ -794,10 +813,12 @@ print(f"β̃(σ̂ = {sigma_rs}) = {bt_check:.5f}  (paper reports ≈ 0.9995) ✓
 
 The plot confirms the paper's key finding: **activating a preference for
 robustness is observationally equivalent — for consumption and saving behaviour
-— to increasing the discount factor**.  However, as Hansen, Sargent, and
-Tallarini (1999) and Hansen, Sargent, and Whiteman argue, the two
-parametrisations do **not** imply the same asset prices,
-because the robust model generates different state-prices through the
+— to increasing the discount factor**. 
+
+However, {cite}`HST_1999` show that  the two
+parametrisations do **not** imply the same asset prices.
+
+* this happens because the  model in which the representative agent distrusts his model  generates different state-prices through the
 $\mathcal{D}(P)$ matrix that enters the stochastic discount factor.
 
 ---
@@ -814,7 +835,9 @@ The table below condenses the main results:
 
 In all three cases, the decision maker can be described as following a
 two-step procedure: first solve a nonstochastic control problem, then form
-beliefs.  The difference is in which beliefs are formed in the second step.
+beliefs.  
+
+The difference is in which beliefs are formed in the second step.
 
 ---
 

From 827114d456d9cd81c03e5165c9a4bb478264afa8 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 17 Mar 2026 11:57:15 +1100
Subject: [PATCH 09/12] update dateset location to direct to main branch

---
 lectures/risk_aversion_or_mistaken_beliefs.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lectures/risk_aversion_or_mistaken_beliefs.md b/lectures/risk_aversion_or_mistaken_beliefs.md
index 2bf92343b..890c68e30 100644
--- a/lectures/risk_aversion_or_mistaken_beliefs.md
+++ b/lectures/risk_aversion_or_mistaken_beliefs.md
@@ -1607,7 +1607,7 @@ mystnb:
     name: fig-us-yields
 ---
 data = pd.read_csv(
-    'https://raw.githubusercontent.com/QuantEcon/lecture-python.myst/update-asset/lectures/'
+    'https://raw.githubusercontent.com/QuantEcon/lecture-python.myst/main/lectures/'
     '_static/lecture_specific/risk_aversion_or_mistaken_beliefs/fred_data.csv',
     parse_dates=['DATE'], index_col='DATE'
 )

From ef03cd10bd25d4cb91bc5821427a9b4e699879ee Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 17 Mar 2026 14:44:54 +1100
Subject: [PATCH 10/12] updates

---
 lectures/risk_aversion_or_mistaken_beliefs.md |   4 +-
 lectures/theil_1.md                           | 218 +++---
 lectures/theil_2.md                           | 702 +++++++-----------
 3 files changed, 396 insertions(+), 528 deletions(-)

diff --git a/lectures/risk_aversion_or_mistaken_beliefs.md b/lectures/risk_aversion_or_mistaken_beliefs.md
index 890c68e30..c8cce6826 100644
--- a/lectures/risk_aversion_or_mistaken_beliefs.md
+++ b/lectures/risk_aversion_or_mistaken_beliefs.md
@@ -1607,8 +1607,8 @@ mystnb:
     name: fig-us-yields
 ---
 data = pd.read_csv(
-    'https://raw.githubusercontent.com/QuantEcon/lecture-python.myst/main/lectures/'
-    '_static/lecture_specific/risk_aversion_or_mistaken_beliefs/fred_data.csv',
+    'https://raw.githubusercontent.com/QuantEcon/lecture-python.myst/refs/heads/'
+    'main/lectures/_static/lecture_specific/risk_aversion_or_mistaken_beliefs/fred_data.csv',
     parse_dates=['DATE'], index_col='DATE'
 )
 
diff --git a/lectures/theil_1.md b/lectures/theil_1.md
index 70208eed5..caaa874ec 100644
--- a/lectures/theil_1.md
+++ b/lectures/theil_1.md
@@ -18,7 +18,7 @@ kernelspec:
 </div>
 ```
 
-# Certainty Equivalence 
+# Certainty Equivalence
 
 ```{index} single: Certainty Equivalence; Robustness
 ```
@@ -46,62 +46,50 @@ import matplotlib.pyplot as plt
 from quantecon import LQ
 ```
 
-## Overview 
+## Overview
 
-
-Simon {cite}`simon1956dynamic` and Theil {cite}`theil1957note` established a celebrated
-*certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
-problems.  
+{cite:t}`simon1956dynamic` and {cite:t}`theil1957note` established a celebrated *certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming problems.
 
 Their result justifies a convenient two-step algorithm:
 
 1. **Optimize** under perfect foresight (treat future exogenous variables as known).
 2. **Forecast** — substitute optimal forecasts for the unknown future values.
 
-The striking insight is that these two steps are completely separable. 
-
-The decision rule that emerges from step 1 is *identical* to the decision rule for the original
-stochastic problem once optimal forecasts are substituted in step 2. 
+The striking insight is that these two steps are completely separable.
 
+The decision rule that emerges from step 1 is *identical* to the decision rule for the original stochastic problem once optimal forecasts are substituted in step 2.
 
-The decision rule does not depend on the variance of the shocks, but  the *level* of
-the optimal value function *does*.
+The decision rule does not depend on the variance of the shocks, but the *level* of the optimal value function *does*.
 
 After describing the structure of the certainty equivalence property in detail, this lecture describes its role in rational expectations modeling.
 
-We do so  by drawing  heavily on the introduction to {cite}`lucas1981rational`.
-
-In addition to learning the certainty equivalence principle, this lecture describes  troubles with   pre-rational expectations econometric policy evaluation procedures described by {cite}`lucas1976econometric`.
+We do so by drawing heavily on the introduction to {cite}`lucas1981rational`.
 
+In addition to learning the certainty equivalence principle, this lecture describes troubles with pre-rational expectations econometric policy evaluation procedures described by {cite}`lucas1976econometric`.
 
 ```{note}
-That volume that volume collected early  papers on rational expectations modeling and econometrics.
+That volume collected early papers on rational expectations modeling and econometrics.
 ```
 
-## A Central Problem of Empirical Economics
-
+## A central problem of empirical economics
 
-To set the stage,  {cite}`lucas1981rational` stated  the central question for empirical economics that had been posed by Leonid Hurwicz ({cite}`Hurwicz:1962`,{cite}`Hurwicz:1966`):
+To set the stage, {cite:t}`lucas1981rational` stated the central question for empirical economics that had been posed by Leonid Hurwicz ({cite}`Hurwicz:1962`,{cite}`Hurwicz:1966`):
 
- *  Given observations on an agent's behavior in a particular economic environment, what can we infer about how that behavior **would have differed** had the environment been altered? 
+ *  Given observations on an agent's behavior in a particular economic environment, what can we infer about how that behavior *would have differed* had the environment been altered?
 
 ```{note}
-Hurwicz formulates a notion of 'causality' as a context-specific concept that he casts  in terms of a well posed decision problem. 
+Hurwicz formulates a notion of 'causality' as a context-specific concept that he casts in terms of a well posed decision problem.
 ```
- 
- This is the problem of policy-invariant structural inference in the following setting.  
 
-  * Observations emerged  under one environment or 'regime'
-  * We want to predict behavior under another 'regime'
-  * Unless we understand *why*  agents behaves as they did in the historical regime, i.e., their purposes, we can't predict their behavior under the constraints they face in the new regime. 
+This is the problem of policy-invariant structural inference in the following setting.
 
-To confront the problem that Hurwicz had posed, {cite}`lucas1981rational` formulated
-the following decision framework. 
- 
+  * Observations emerged under one environment or 'regime'
+  * We want to predict behavior under another 'regime'
+  * Unless we understand *why* agents behaved as they did in the historical regime, i.e., their purposes, we can't predict their behavior under the constraints they face in the new regime.
 
----
+To confront the problem that Hurwicz had posed, {cite:t}`lucas1981rational` formulated the following decision framework.
 
-## A Formal Setup
+## A formal setup
 
 Consider a single decision maker whose situation at date $t$ is fully described by two state variables $(x_t, z_t)$.
 
@@ -109,12 +97,18 @@ Consider a single decision maker whose situation at date $t$ is fully described
 
 ```{math}
 :label: eq:z_transition_v3
-z_{t+1} = f(z_t,\, \epsilon_t),
+z_{t+1} = f(z_t,\, \epsilon_{t+1}),
 ```
 
-where the innovations $\epsilon_t \in \mathcal{E}$ are i.i.d. draws from a fixed c.d.f. $\Phi(\cdot) : \mathcal{E} \to [0,1]$. The function $f : S_1 \times \mathcal{E} \to S_1$ is called the **decision maker's environment**.
+where the innovations $\{\epsilon_t\}$ are i.i.d. draws from a fixed c.d.f. $\Phi(\cdot) : \mathcal{E} \to [0,1]$.
+
+The function $f : S_1 \times \mathcal{E} \to S_1$ is called the **decision maker's environment**.
 
-**The endogenous state** $x_t \in S_2$ is under partial control of the agent. Each period the agent selects an action $u_t \in U$. A fixed technology $g : S_1 \times S_2 \times U \to S_2$ governs the transition
+**The endogenous state** $x_t \in S_2$ is under partial control of the agent.
+
+Each period the agent selects an action $u_t \in U$.
+
+A fixed technology $g : S_1 \times S_2 \times U \to S_2$ governs the transition
 
 ```{math}
 :label: eq:x_transition_v3
@@ -130,26 +124,23 @@ u_t = h(z_t,\, x_t).
 
 The econometrician observes (some or all of) the process $\{z_t, x_t, u_t\}$, the joint motion of which is determined by {eq}`eq:z_transition_v3`, {eq}`eq:x_transition_v3`, and {eq}`eq:decision_rule_v3`.
 
----
 
-##  Estimated Rules Are Not Enough
+## Estimated rules are not enough
 
-Suppose we have estimated $f$, $g$, and $h$ from a long time series generated under a fixed environment $f_0$. 
+Suppose we have estimated $f$, $g$, and $h$ from a long time series generated under a fixed environment $f_0$.
 
-This gives us $h_0 = T(f_0)$, where $T$ is the (unknown) functional mapping environments into optimal decision rules. 
+This gives us $h_0 = T(f_0)$, where $T$ is the (unknown) functional mapping environments into optimal decision rules.
 
-But this single estimate, however precise, **reveals nothing** about how $T(f)$ varies with $f$.
+But this single estimate, however precise, *reveals nothing* about how $T(f)$ varies with $f$.
 
 Policy evaluation requires knowledge of the entire map $f \mapsto T(f)$.
 
 Under an environment change $f_0 \to f_1$, agents will in general revise their decision rules $h_0 \to h_1 = T(f_1)$, rendering the estimated rule $h_0$ invalid for forecasting behavior under $f_1$.
 
+{cite:t}`lucas1976econometric` and the introduction to {cite}`lucas1981rational` conclude that the only nonexperimental path forward is to recover the **return function** $V$ from which $h$ is derived as the solution to an optimization problem, and then re-solve that problem under the counterfactual environment $f_1$.
 
-{cite}`lucas1976econometric` and the introduction to {cite}`lucas1981rational` conclude that the only nonexperimental path forward is to recover the **return function** $V$ from which $h$ is derived as the solution to an optimization problem, and then re-solve that problem under the counterfactual environment $f_1$.
-
----
 
-##  An Optimization Problem
+## An optimization problem
 
 Assume the agent selects $h$ to maximize the expected discounted sum of current-period returns $V : S_1 \times S_2 \times U \to \mathbb{R}$:
 
@@ -158,29 +149,31 @@ Assume the agent selects $h$ to maximize the expected discounted sum of current-
 E_0\!\left\{\sum_{t=0}^{\infty} \beta^t\, V(z_t,\, x_t,\, u_t)\right\}, \qquad 0 < \beta < 1,
 ```
 
-given initial conditions $(z_0, x_0)$, the environment $f$, and the technology $g$. Here $E_0\{\cdot\}$ denotes expectation conditional on $(z_0, x_0)$ with respect to the distribution of $\{z_1, z_2, \ldots\}$ induced by {eq}`eq:z_transition_v3`.
+given initial conditions $(z_0, x_0)$, the environment $f$, and the technology $g$.
+
+Here $E_0\{\cdot\}$ denotes expectation conditional on $(z_0, x_0)$ with respect to the distribution of $\{z_1, z_2, \ldots\}$ induced by {eq}`eq:z_transition_v3`.
 
-In principle, knowledge of $V$ (together with $g$ and $f$) allows one to compute $h = T(f)$ theoretically and hence to trace out $T(f)$ for any counterfactual $f$. 
+In principle, knowledge of $V$ (together with $g$ and $f$) allows one to compute $h = T(f)$ theoretically and hence to trace out $T(f)$ for any counterfactual $f$.
 
 The essential question is whether $V$ can itself be recovered from observations on $\{f, g, h\}$.
 
-:::{note}
-The decision rule is in general a functional $h = T(f, g, V)$. The dependence on $g$ and $V$ is suppressed in the main text but made explicit when needed.
-:::
+```{note}
+The decision rule is in general a functional $h = T(f, g, V)$.
+The dependence on $g$ and $V$ is suppressed in the main text but made explicit when needed.
+```
 
----
 
-## A Linear-Quadratic DP problems  and Certainty Equivalence
+## A linear-quadratic DP problem and certainty equivalence
 
-Progress beyond the level of generality of the previous section requires restricting the primitives. 
+Progress beyond the level of generality of the previous section requires restricting the primitives.
 
-A productive restriction, exploited in the papers collected in {cite}`lucas1981rational`, imposes **quadratic** $V$ and **linear** $g$, which forces $h$ to be linear.
+A productive restriction, exploited in the papers collected in {cite}`lucas1981rational`, imposes *quadratic* $V$ and *linear* $g$, which forces $h$ to be linear.
 
 As part of its computational tractability, this specialization delivers a striking structural result:
 
-*  the **certainty equivalence** theorem of Simon {cite}`simon1956dynamic`  and Theil {cite}`theil1957note`. 
+*  the **certainty equivalence** theorem of {cite:t}`simon1956dynamic` and {cite:t}`theil1957note`.
 
-###   Decomposition of $h$
+### Decomposition of $h$
 
 Under quadratic $V$ and linear $g$, the optimal decision rule $h$ decomposes into two components applied in sequence.
 
@@ -191,7 +184,9 @@ Under quadratic $V$ and linear $g$, the optimal decision rule $h$ decomposes int
 \tilde{z}_t \;=\; \bigl(z_t,\;\; {}_{t+1}z_t^e,\;\; {}_{t+2}z_t^e,\;\ldots\bigr) \;\in\; S_1^\infty,
 ```
 
-where ${}_{t+j}z_t^e$ denotes the least-mean-squared-error forecast of $z_{t+j}$ formed at time $t$. The optimal forecast sequence is a (generally nonlinear) function of the current state:
+where ${}_{t+j}z_t^e$ denotes the least-mean-squared-error forecast of $z_{t+j}$ formed at time $t$.
+
+The optimal forecast sequence is a (generally nonlinear) function of the current state:
 
 ```{math}
 :label: eq:forecast_rule_v3
@@ -216,7 +211,7 @@ The ultimate decision rule is therefore the **composite**:
 \boxed{h(z_t, x_t) \;=\; h_1\!\bigl[h_2(z_t),\; x_t\bigr].}
 ```
 
-###  The Separation Principle
+### The separation principle
 
 {eq}`eq:composite_rule_v3` embodies a clean **separation** of the two sources of dependence in $h$:
 
@@ -229,32 +224,30 @@ Since policy analysis concerns changes in $f$, and since $h_1$ is invariant to $
 
 The relationship of original interest, $h = T(f)$, then follows directly from {eq}`eq:composite_rule_v3`.
 
-###  Certainty Equivalence and Perfect Foresight
+### Certainty equivalence and perfect foresight
 
-The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** — i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast. 
+The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** — i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast.
 
 Randomness of the environment affects actions only through the forecast $\tilde{z}_t$; conditional on $\tilde{z}_t$, the optimization problem is deterministic.
 
 This means the LQ problem decouples into:
 
- *  **Dynamic optimization under perfect foresight** — solve for $h_1$ from $(V, g)$ by treating $\tilde{z}_t$ as known. This is a standard deterministic LQ regulator problem and is independent of the environment $(f, \Phi)$.
+ *  **Dynamic optimization under perfect foresight** — solve for $h_1$ from $(V, g)$ by treating $\tilde{z}_t$ as known, yielding a standard deterministic LQ regulator problem independent of the environment $(f, \Phi)$.
 
- *  **Optimal linear prediction** — solve for $h_2 = S(f)$ from $(f, \Phi)$ using least-squares forecasting theory. If $f$ is itself linear, $h_2$ is also linear and reduces to a standard Kalman/Wiener prediction formula.
+ *  **Optimal linear prediction** — solve for $h_2 = S(f)$ from $(f, \Phi)$ using least-squares forecasting theory, which reduces to a standard Kalman/Wiener prediction formula when $f$ is itself linear.
 
-###  Cross-Equation Restrictions
+### Cross-equation restrictions
 
 A hallmark of the rational expectations hypothesis as it appears in this framework is that it ties together what would otherwise be free parameters in different equations.
 
-The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ — i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ — imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$. 
+The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ — i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ — imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$.
 
 These restrictions, rather than any conditions on distributed lags within a single equation, are the operative empirical content of rational expectations.
 
 ```{note}
-This is the message of {cite}`lucas1976econometric` and  {cite}`sargent1981interpreting`. 
+This is the message of {cite}`lucas1976econometric` and {cite}`sargent1981interpreting`.
 ```
 
-### Python: Demonstrating Certainty Equivalence
-
 The following code verifies the CE principle numerically.
 
 We consider a simple scalar LQ problem:
@@ -266,58 +259,57 @@ and vary the noise standard deviation $\sigma$ across a wide range.
 The CE theorem predicts that:
 
 * the **policy gain** $F$ (the coefficient in $u_t = -F y_t$) is independent of $\sigma$, and
-* the **value constant** $d$ (the additive term in $V(y) = -y' P y - d$) grows with $\sigma$.
+* the **value constant** $d$ (the additive term in $V(y) = -y^\top P y - d$) grows with $\sigma$.
 
 ```{code-cell} ipython3
-# ── Simple 1-D scalar LQ problem ───────────────────────────────────────────
-# y_{t+1} = a·y_t + b·u_t + σ·ε_{t+1},   r = −(q·y² + r·u²)
-
+---
+mystnb:
+  figure:
+    caption: "CE: policy does not depend on noise"
+    name: fig-ce-policy-noise
+---
 a, b_coeff = 0.9, 1.0
-q_state, r_ctrl = 1.0, 1.0
-beta = 0.95
+q, r = 1.0, 1.0
+β = 0.95
 
 A = np.array([[a]])
 B = np.array([[b_coeff]])
-Q_mat = np.array([[q_state]])
-R_mat = np.array([[r_ctrl]])
+R_mat = np.array([[q]])          # state cost
+Q_mat = np.array([[r]])          # control cost
 
-sigma_vals = np.linspace(0.0, 3.0, 80)
+σ_vals = np.linspace(0.0, 3.0, 80)
 F_vals, d_vals = [], []
 
-for sigma in sigma_vals:
-    C = np.array([[sigma]])
-    lq = LQ(Q_mat, R_mat, A, B, C=C, beta=beta)
+for σ in σ_vals:
+    C = np.array([[σ]])
+    lq = LQ(Q_mat, R_mat, A, B, C=C, beta=β)
     P, F, d = lq.stationary_values()
     F_vals.append(float(F[0, 0]))
     d_vals.append(float(d))
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-axes[0].plot(sigma_vals, F_vals, lw=2)
-axes[0].set_xlabel('Noise level $\\sigma$')
-axes[0].set_ylabel('Policy gain $F$')
-axes[0].set_title('CE: Policy does not depend on noise')
+axes[0].plot(σ_vals, F_vals, lw=2)
+axes[0].set_xlabel('noise level $\\sigma$')
+axes[0].set_ylabel('policy gain $F$')
 axes[0].set_ylim(0, 2 * max(F_vals) + 0.1)
 
-axes[1].plot(sigma_vals, d_vals, lw=2, color='darkorange')
-axes[1].set_xlabel('Noise level $\\sigma$')
-axes[1].set_ylabel('Value constant $d$')
-axes[1].set_title('Noise lowers value but not the decision rule')
+axes[1].plot(σ_vals, d_vals, lw=2, color='darkorange')
+axes[1].set_xlabel('noise level $\\sigma$')
+axes[1].set_ylabel('value constant $d$')
 
 plt.tight_layout()
 plt.show()
 ```
 
-As the plot confirms, $F$ (the policy gain) is **flat** across all noise levels,
-while the value constant $d$ increases monotonically with $\sigma$.
+As the plot confirms, $F$ (the policy gain) is *flat* across all noise levels, while the value constant $d$ increases monotonically with $\sigma$.
 
-This is the CE principle in action: **uncertainty changes the value of the problem but not the optimal decision rule**.
+This is the CE principle in action: uncertainty changes the value of the problem but not the optimal decision rule.
 
----
 
-##  A Trouble with  Ad Hoc Expectations 
+## A trouble with ad hoc expectations
 
-Prior practice, exemplified by the adaptive expectations mechanisms of Friedman {cite}`Friedman1956` and Cagan {cite}`Cagan`, directly postulated a particular form of {eq}`eq:forecast_rule_v3`:
+Prior practice, exemplified by the adaptive expectations mechanisms of {cite:t}`Friedman1956` and {cite:t}`Cagan`, directly postulated a particular form of {eq}`eq:forecast_rule_v3`:
 
 ```{math}
 :label: eq:adaptive_expectations_v3
@@ -326,11 +318,18 @@ Prior practice, exemplified by the adaptive expectations mechanisms of Friedman
 
 treating the coefficient $\lambda$ as a free parameter to be estimated from data, with no reference to the underlying environment $f$.
 
-The deficiency is not that {eq}`eq:adaptive_expectations_v3` is a distributed lag — linear forecasting rules are perfectly acceptable simplifications. The deficiency is that the **coefficients** of the distributed lag are left unrestricted by theory. The mapping $h_2 = S(f)$ shows that optimal forecasting coefficients are *determined* by $f$: when $f$ changes, $h_2$ changes, and so does $h$. An estimated $\lambda$ calibrated under $f_0$ is therefore non-structural and will give incorrect predictions whenever $f$ is altered. This is the econometric content of the critique that Muth's paper delivers.
+The deficiency is not that {eq}`eq:adaptive_expectations_v3` is a distributed lag — linear forecasting rules are perfectly acceptable simplifications.
+
+The deficiency is that the **coefficients** of the distributed lag are left unrestricted by theory.
+
+The mapping $h_2 = S(f)$ shows that optimal forecasting coefficients are *determined* by $f$: when $f$ changes, $h_2$ changes, and so does $h$.
+
+An estimated $\lambda$ calibrated under $f_0$ is therefore non-structural and will give incorrect predictions whenever $f$ is altered.
+
+This is the econometric content of the critique delivered by {cite:t}`Muth1960`.
 
 Rational expectations equates the subjective distribution that agents use in forming $\tilde{z}_t$ to the objective distribution $f$ that actually generates the data, thereby closing the model and eliminating free parameters in $h_2$.
 
----
 
 ## Exercises
 
@@ -338,16 +337,15 @@ Rational expectations equates the subjective distribution that agents use in for
 :label: theil1_ex1
 ```
 
-**CE and noise variance.**
-
 Using the scalar LQ setup in the code cell above (with $a = 0.9$, $b = 1$,
 $q = r = 1$, $\beta = 0.95$), verify numerically that the value constant $d$
 satisfies $d \propto \sigma^2$.
 
 *Hint:* From the CE analysis, the value constant satisfies
-$d = \tfrac{\beta}{1-\beta}\,\mathrm{tr}(C' P C)$,
+$d = \tfrac{\beta}{1-\beta}\,\mathrm{tr}(C^\top P C)$,
 and since $C = \sigma$ in the scalar case, this gives
 $d = \tfrac{\beta}{1-\beta}\, P\, \sigma^2$.
+
 Confirm that a plot of $d$ against $\sigma^2$ is linear and compute the theoretical
 slope $\tfrac{\beta}{1-\beta} P$.
 
@@ -359,39 +357,35 @@ slope $\tfrac{\beta}{1-\beta} P$.
 ```
 
 ```{code-cell} ipython3
-# Reuse F_vals and d_vals already computed above
-sigma_sq_vals = sigma_vals ** 2
+σ_sq_vals = σ_vals ** 2
 
 fig, ax = plt.subplots(figsize=(8, 5))
-ax.plot(sigma_sq_vals, d_vals, lw=2)
+ax.plot(σ_sq_vals, d_vals, lw=2)
 ax.set_xlabel('$\\sigma^2$')
-ax.set_ylabel('Value constant $d$')
+ax.set_ylabel('value constant $d$')
 ax.set_title('Value constant is linear in noise variance (CE principle)')
 
-# Overlay linear fit
-coeffs = np.polyfit(sigma_sq_vals, d_vals, 1)
-ax.plot(sigma_sq_vals, np.polyval(coeffs, sigma_sq_vals),
-        'r--', lw=1.5, label=f'Linear fit: slope = {coeffs[0]:.3f}')
+coeffs = np.polyfit(σ_sq_vals, d_vals, 1)
+ax.plot(σ_sq_vals, np.polyval(coeffs, σ_sq_vals),
+        'r--', lw=2, label=f'Linear fit: slope = {coeffs[0]:.3f}')
 ax.legend()
 plt.tight_layout()
 plt.show()
 
-# Theoretical slope: β/(1−β) × P
 P_scalar = float(LQ(Q_mat, R_mat, A, B, C=np.zeros((1, 1)),
-                    beta=beta).stationary_values()[0])
-theoretical_slope = beta / (1 - beta) * P_scalar
+                    beta=β).stationary_values()[0].item())
+theoretical_slope = β / (1 - β) * P_scalar
 print(f"Empirical slope:    {coeffs[0]:.4f}")
-print(f"Theoretical slope β/(1−β)·P = {theoretical_slope:.4f}")
+print(f"Theoretical slope β/(1-β)*P = {theoretical_slope:.4f}")
 ```
 
 The slope is indeed $\tfrac{\beta}{1-\beta} P$, confirming the analytic formula.
-The policy matrix $P$ is determined entirely by preferences and technology, not by the
-noise level — a direct consequence of the certainty equivalence principle.
+
+The value matrix $P$ is determined entirely by preferences and technology, not by the noise level — a direct consequence of the certainty equivalence principle.
 
 ```{solution-end}
 ```
 
 ## Concluding remarks
 
-This sequel  {doc}`certainty equivalence and model uncertainty <theil_2>` describes how to extend the certainty equivalence principle to
-linear-quadratic setting in which a decision distrusts the transition dynamics specified in his baseline model.
+This sequel [certainty equivalence and model uncertainty](theil_2) describes how to extend the certainty equivalence principle to a linear-quadratic setting in which a decision maker distrusts the transition dynamics specified in his baseline model.
diff --git a/lectures/theil_2.md b/lectures/theil_2.md
index a86a174b9..9e8f267c3 100644
--- a/lectures/theil_2.md
+++ b/lectures/theil_2.md
@@ -35,41 +35,42 @@ kernelspec:
 ## Overview
 
 
-This is a sequel to {doc}`this lecture on certainty equivalence <theil_1>` that described 
-established an important *certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
-problems.  
+This is a sequel to [this lecture on certainty equivalence](theil_1) that established an important *certainty equivalence* (CE) property for linear-quadratic (LQ) dynamic programming
+problems.
 
-The property justifies  a  two-step algorithm for computing optimal decision rules:
+The property justifies a two-step algorithm for computing optimal decision rules:
 
-1. **Optimize** under perfect foresight (treat future exogenous variables as known).
-2. **Forecast** — substitute optimal forecasts for the unknown future values.
+1. *Optimize* under perfect foresight (treat future exogenous variables as known).
+2. *Forecast* — substitute optimal forecasts for the unknown future values.
 
-This lecture extends the certainty equivalence property  in two directions motivated by
+This lecture extends the certainty equivalence property in two directions motivated by
 {cite}`hansen2004certainty`:
 
-- **Model uncertainty and robustness.** What happens when the decision maker does not
+- *Model uncertainty and robustness.* What happens when the decision maker does not
   trust his model?  A remarkable version of CE survives, but now the "forecasting" step
   uses a *distorted* probability distribution that the decision maker deliberately tilts
   against himself in order to achieve robustness.
 
-- **Risk-sensitive preferences.** A mathematically equivalent reformulation interprets
-  the same decision rules through Epstein–Zin recursive preferences.  The robustness
+- *Risk-sensitive preferences.* A mathematically equivalent reformulation interprets
+  the same decision rules through recursive risk-sensitive preferences.  
+  
+  The robustness
   parameter $\theta$ and the risk-sensitivity parameter $\sigma$ are linked by
   $\theta = -\sigma^{-1}$.
 
 We illustrate all three settings — ordinary CE, robust CE, and the permanent income
 application — with Python code using `quantecon`.
 
-### Model Features
+### Model features
 
 * Linear transition laws and quadratic objectives (LQ framework).
 * Ordinary CE: optimal policy independent of noise variance.
-* Robust CE: distorted forecasts replace model baseline model forecasts; policy funciton depends on  $\theta$.
+* Robust CE: distorted forecasts replace baseline model forecasts; policy function depends on $\theta$.
 * Permanent income application: Hall's martingale, precautionary savings under robustness,
   and observational equivalence between robustness and patience.
 
 
-This lecture draws on {cite}`hansen2004certainty` and  {cite}`HansenSargent2008`.
+This lecture draws on {cite}`hansen2004certainty` and {cite}`HansenSargent2008`.
 
 In addition to what's in Anaconda, this lecture will need the following libraries:
 
@@ -81,202 +82,104 @@ tags: [hide-output]
 ```
 
 
-We begin with imports:
+We use the following imports:
 
 ```{code-cell} ipython3
 import numpy as np
 import matplotlib.pyplot as plt
-from scipy.linalg import solve
 from quantecon import LQ, RBLQ
 ```
 
----
-
-## Ordinary Certainty Equivalence
 
-### Notation and Setup
+## Recap: ordinary certainty equivalence
 
-Let $y_t$ denote the state vector, partitioned as
-
-```{math}
-:label: eq:state_partition_o 
-y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix}
-```
+The {ref}`companion lecture <certainty_equiv_theil1>` established the CE
+property in detail.  Here we collect only the elements needed for the
+robustness extension below.
 
-where $z_t$ is an *exogenous* component with transition law
+The state vector $y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix}$ has an
+exogenous component $z_t$ with transition law
 
 ```{math}
-:label: eq:z_transition_o 
+:label: eq:z_transition_o
 z_{t+1} = f(z_t,\, \epsilon_{t+1})
 ```
 
-and $\epsilon_{t+1}$ is an i.i.d. sequence with c.d.f. $\Phi$.
-
-The *endogenous* component $x_t$ obeys
-
-```{math}
-:label: eq:x_transition_o 
-x_{t+1} = g(x_t,\, z_t,\, u_t)
-```
-
-where $u_t$ is the decision maker's control.
-
-The decision maker maximises the discounted expected return
-
-```{math}
-:label: eq:objective_o
-\mathbb{E}\!\left[\sum_{t=0}^{\infty} \beta^t\, r(y_t, u_t)\,\Big|\, y^0\right],
-\qquad \beta \in (0,1)
-```
-
-choosing a control $u_t$ measurable with respect to the history $y^t \equiv
-(x^t, z^t)$.  
-
-The maximizer is a stationary decision rule
-
-```{math}
-:label: eq:stationary_rule_o 
-u_t = h(x_t, z_t).
-```
-
-Throughout, we maintain the following assumption from Simon and Theil:
-
-> **Assumption 1.**  The return function $r(y,u) = -y'Qy - u'Ru$ is quadratic
-> ($Q, R \succeq 0$); $f$ and $g$ are both linear; and $\Phi$ is multivariate
-> Gaussian with mean zero.
-
-### The Two-Step Algorithm
-
-Under Assumption 1, the stochastic optimisation problem separates into two independent
-steps.
-
-**Step 1 — Perfect-foresight control.**  Solve the *nonstochastic* problem of
-maximising {eq}`eq:objective_o` subject to {eq}`eq:x_transition_o`, treating the future sequence
-$\mathbf{z}_t = (z_t, z_{t+1}, \ldots)$ as known.  
-
-The solution is the
-*feedback-feedforward* rule
-
-```{math}
-:label: eq:ff_rule_o
-u_t = h_1(x_t,\, \mathbf{z}_t).
-```
-
-The function $h_1$ depends only on $r$ and $g$ (i.e., only on $Q$, $R$, and the
-matrices of the $x$-transition law).  It does **not** require knowledge of the
-noise process $f$ or $\Phi$.  Under Assumption 1, $h_1$ is a linear function.
-
-**Step 2 — Optimal forecasting.**  Using $f$ and $\Phi$ in {eq}`eq:z_transition_o`,
-iterate the linear law of motion forward:
-
-```{math}
-:label: eq:forecast_expansion_o
-\mathbf{z}_t = h_2 \cdot z_t\; +\; h_3 \cdot \epsilon_{t+1}^{\infty}.
-```
-
-Since the shocks are i.i.d. with mean zero,
-
-```{math}
-:label: eq:optimal_forecast_o   
-\mathbb{E}[\mathbf{z}_t \mid z^t] = h_2 \cdot z_t.
-```
-
-**The CE principle.**  Substitute {eq}`eq:optimal_forecast_o` for $\mathbf{z}_t$ in {eq}`eq:ff_rule_o` and impose $z^t = z_t$ to get the CE decision rule:
-
-```{math}
-:label: eq:ce_rule
-u_t = h_1(x_t,\; h_2 \cdot z_t) \;=\; h(x_t,\, z_t).
-```
-
-Each of $h_1$, $h_2$, and $h$ is a linear function.  The original stochastic
-problem thus *separates* into a nonstochastic control problem and a statistical
-filtering problem.
-
-### Value Function and Volatility
-
-The optimal value function takes the quadratic form
+and an endogenous component $x_t$ obeying
 
 ```{math}
-:label: eq:value_fn_o
-V(y_0) = -y_0' P\, y_0 - p.
+:label: eq:x_transition_o
+x_{t+1} = g(x_t,\, z_t,\, u_t).
 ```
 
-Two key observations follow from the separation:
+Under the LQ assumption (quadratic return $r(y,u) = -y^\top Qy - u^\top Ru$,
+linear $f$ and $g$, Gaussian shocks), the optimal decision rule $h$ decomposes
+as $u_t = h_1(x_t,\, h_2 \cdot z_t)$ where $h_1$ solves a nonstochastic
+control problem and $h_2$ solves an optimal forecasting problem.
 
-- The matrix $P$ is the fixed point of an operator $T(P; r, g, f_1)$ that involves
-  only the *persistence* matrix $f_1$ (from $z_{t+1} = f_1 z_t + f_2 \epsilon_{t+1}$),
-  **not** the volatility matrix $f_2$.  Therefore **$P$ does not depend on the noise
-  loadings**, and neither does the decision rule $h$.
+The optimal value function is $V(y_0) = -y_0^\top P\, y_0 - p$ where,
+writing $z_{t+1} = f_1 z_t + f_2 \epsilon_{t+1}$:
 
-- The scalar constant $p$ equals $\beta/(1-\beta)\,\mathrm{tr}(f_2' P f_2)$, so
-  **$p$ grows with volatility**.
+- $P$ is the fixed point of an operator $T(P; r, g, f_1)$ that does *not*
+  involve the volatility matrix $f_2$, so neither $P$ nor the decision rule
+  $h$ depends on the noise loadings.
 
-An equivalent statement: the same decision rule $h$ emerges from the *nonstochastic*
-version of the problem obtained by setting all shocks to zero,
-$z_{t+1} = f_1 z_t$.  
+- The constant $p = \beta/(1-\beta)\,\mathrm{tr}(f_2^\top P f_2)$ grows with
+  volatility.
 
-The presence of uncertainty *lowers the value* (larger $p$)
-but does not alter *behaviour*.
+Uncertainty lowers the value (larger $p$) but does not alter behaviour.
 
-### Python: Demonstrating Certainty Equivalence
-
-The following code verifies the CE principle numerically. 
-
-We consider a simple scalar LQ problem and vary the noise standard deviation $\sigma$.
+The following code sets up a scalar LQ problem and confirms that the policy
+gain $F$ is invariant to the noise level $\sigma$ while $d$ grows with it.
 
 ```{code-cell} ipython3
-# ── Simple 1-D scalar LQ problem ───────────────────────────────────────────
-# y_{t+1} = a·y_t + b·u_t + σ·ε_{t+1},   r = −(q·y² + r·u²)
-
+---
+mystnb:
+  figure:
+    caption: CE principle — policy vs. value
+    name: fig-ce-policy-value
+---
 a, b_coeff = 0.9, 1.0
-q_state, r_ctrl = 1.0, 1.0
-beta = 0.95
+q, r = 1.0, 1.0
+β = 0.95
 
 A = np.array([[a]])
 B = np.array([[b_coeff]])
-Q_mat = np.array([[q_state]])
-R_mat = np.array([[r_ctrl]])
+Q_mat = np.array([[q]])          # state cost
+R_mat = np.array([[r]])          # control cost
 
-sigma_vals = np.linspace(0.0, 3.0, 80)
+σ_vals = np.linspace(0.0, 3.0, 80)
 F_vals, d_vals = [], []
 
-for sigma in sigma_vals:
-    C = np.array([[sigma]])
-    lq = LQ(Q_mat, R_mat, A, B, C=C, beta=beta)
+for σ in σ_vals:
+    C = np.array([[σ]])
+    lq = LQ(R_mat, Q_mat, A, B, C=C, beta=β)
     P, F, d = lq.stationary_values()
     F_vals.append(float(F[0, 0]))
     d_vals.append(float(d))
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-axes[0].plot(sigma_vals, F_vals, lw=2)
-axes[0].set_xlabel('Noise level $\\sigma$')
-axes[0].set_ylabel('Policy gain $F$')
-axes[0].set_title('CE: Policy does not depend on noise')
+axes[0].plot(σ_vals, F_vals, lw=2)
+axes[0].set_xlabel('noise level $\\sigma$')
+axes[0].set_ylabel('policy gain $F$')
 axes[0].set_ylim(0, 2 * max(F_vals) + 0.1)
 
-axes[1].plot(sigma_vals, d_vals, lw=2, color='darkorange')
-axes[1].set_xlabel('Noise level $\\sigma$')
-axes[1].set_ylabel('Value constant $d$')
-axes[1].set_title('Noise lowers value but not the decision rule')
+axes[1].plot(σ_vals, d_vals, lw=2, color='darkorange')
+axes[1].set_xlabel('noise level $\\sigma$')
+axes[1].set_ylabel('value constant $d$')
 
 plt.tight_layout()
 plt.show()
 ```
 
-As the plot confirms, $F$ (the policy gain) is **flat** across all noise levels,
-while the value constant $d$ increases monotonically with $\sigma$. 
-
-This is the CE principle in action.
-
----
 
-## Model Uncertainty and Robustness
+## Model uncertainty and robustness
 
-### Setup and the Multiplier Problem
+### Setup and the multiplier problem
 
 The decision maker in Simon and Theil's setting knows his model exactly — he has
-no doubt about the transition law {eq}`eq:z_transition`.  
+no doubt about the transition law {eq}`eq:z_transition_o`.
 
 Now suppose he suspects that the true
 data-generating process is
@@ -287,14 +190,14 @@ z_{t+1} = f(z_t,\; \epsilon_{t+1} + w_{t+1})
 ```
 
 where $w_{t+1} = \omega_t(x^t, z^t)$ is a misspecification term chosen by an
-adversarial "nature." 
+adversarial "nature."
 
 The decision maker believes his approximating model is a
 good approximation in the sense that
 
 ```{math}
 :label: eq:misspec_budget
-\hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t\, w_{t+1}' w_{t+1}
+\hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t\, w_{t+1}^\top w_{t+1}
       \,\Big|\, y_0\right] \leq \eta_0,
 ```
 
@@ -308,11 +211,11 @@ To construct a *robust* decision rule the decision maker solves the
 :label: eq:multiplier
 \min_{\{w_{t+1}\}}\, \max_{\{u_t\}}\;
 \hat{\mathbb{E}}\!\left[\sum_{t=0}^{\infty} \beta^t
-    \Bigl\{r(y_t, u_t) + \theta\beta\, w_{t+1}' w_{t+1}\Bigr\}\,
+    \Bigl\{r(y_t, u_t) + \theta\beta\, w_{t+1}^\top w_{t+1}\Bigr\}\,
     \Big|\, y_0\right]
 ```
 
-where $\theta > 0$ penalises large distortions. 
+where $\theta > 0$ penalises large distortions.
 
 A larger $\theta$ shrinks the
 feasible misspecification set; as $\theta \to \infty$ the problem reduces to
@@ -322,10 +225,11 @@ The Markov perfect equilibrium of {eq}`eq:multiplier` delivers a *robust* rule
 $u_t = h(x_t, z_t)$ together with a worst-case distortion process
 $w_{t+1} = W(x_t, z_t)$.
 
-### Stackelberg Timing and the Modified CE
+### Stackelberg timing and the modified CE
+
+The Markov perfect equilibrium *conceals* a form of CE.
 
-The Markov perfect equilibrium *conceals* a form of CE.  To reveal it, Hansen and
-Sargent {cite}`HansenSargent2001` impose a **Stackelberg timing protocol**: at
+To reveal it, {cite:t}`HansenSargent2001` impose a **Stackelberg timing protocol**: at
 time 0, the *minimising* player commits once and for all to a plan
 $\{w_{t+1}\}$, after which the *maximising* player chooses $u_t$ sequentially.
 
@@ -352,8 +256,8 @@ Y_{t+1} = M Y_t + N \epsilon_{t+1}, \qquad w_{t+1} = W(Y_t).
 ```
 
 The maximising player then faces an *ordinary* dynamic programming problem subject
-to his own dynamics {eq}`eq:x_transition`, the distorted $z$-law {eq}`eq:distorted_law`, and the exogenous
-process {eq}`eq:stackelberg_law`. 
+to his own dynamics {eq}`eq:x_transition_o`, the distorted $z$-law {eq}`eq:distorted_law`, and the exogenous
+process {eq}`eq:stackelberg_law`.
 
 His optimal rule takes the form
 
@@ -362,7 +266,7 @@ His optimal rule takes the form
 u_t = \tilde{H}(x_t, z_t, Y_t).
 ```
 
-{cite}`bacsar2008h` and {cite}`hansen2008robustness`  establish that at
+{cite:t}`bacsar2008h` and {cite:t}`hansen2008robustness` establish that at
 equilibrium (with "big $K$ = little $k$" imposed) this collapses to
 
 ```{math}
@@ -372,21 +276,23 @@ equilibrium (with "big $K$ = little $k$" imposed) this collapses to
 
 the *same* rule as the Markov perfect equilibrium of {eq}`eq:multiplier`.
 
-### Modified Separation Principle
+### Modified separation principle
 
-The Stackelberg timing permits an Euler-equation approach.  The two-step algorithm
-becomes:
+The Stackelberg timing permits an Euler-equation approach.
 
-**Step 1** (unchanged).  Solve the same nonstochastic control problem as before:
+The two-step algorithm becomes:
+
+The first step is unchanged: solve the same nonstochastic control problem as before,
+with $\mathbf{z}_t = (z_t, z_{t+1}, \ldots)$ treated as known, giving
 $u_t = h_1(x_t, \mathbf{z}_t)$.
 
-**Step 2** (modified).  Form forecasts using the *distorted* law of motion
+The second step is modified: form forecasts using the *distorted* law of motion
 {eq}`eq:stackelberg_law`.  By the linearity and Gaussianity of the system,
 
 ```{math}
 :label: eq:distorted_forecast
 \hat{\mathbb{E}}[\mathbf{z}_t \mid z^t, Y^t]
-    = \tilde{h}_2 \begin{bmatrix} z_t \\ Y_t \end{bmatrix}
+    = \hat{h}_2 \begin{bmatrix} z_t \\ Y_t \end{bmatrix}
 ```
 
 where $\hat{\mathbb{E}}$ uses the distorted model.
@@ -398,52 +304,53 @@ Substituting {eq}`eq:distorted_forecast` into $h_1$ and imposing $Y_t = y_t$ giv
 u_t = h_1\!\left(x_t,\; \hat{h}_2 \cdot y_t\right) = h(x_t, z_t).
 ```
 
-This is the modified CE: **step 1 is identical to the non-robust case**; only
+This is the modified CE: *step 1 is identical to the non-robust case*; only
 step 2 changes, using distorted rather than rational forecasts.
 
-### Python: How Robustness Changes the Policy
+In contrast to ordinary CE, the robust policy *does* change as $\theta$ varies.
 
-In contrast to ordinary CE, the robust policy **does** change as $\theta$ varies.
 As $\theta \to \infty$ (no robustness) the robust policy converges to the standard LQ
 policy.
 
 ```{code-cell} ipython3
-# ── Robust LQ: same 1-D problem, varying θ ──────────────────────────────────
-sigma_fixed = 1.0
-C_fixed = np.array([[sigma_fixed]])
+---
+mystnb:
+  figure:
+    caption: Robust policy varies with θ
+    name: fig-robust-policy-theta
+---
+σ_fixed = 1.0
+C_fixed = np.array([[σ_fixed]])
 
-# Standard (non-robust) benchmark
-lq_std = LQ(Q_mat, R_mat, A, B, C=C_fixed, beta=beta)
+lq_std = LQ(R_mat, Q_mat, A, B, C=C_fixed, beta=β)
 P_std, F_std_arr, d_std = lq_std.stationary_values()
 F_standard = float(F_std_arr[0, 0])
 P_standard = float(P_std[0, 0])
 
-theta_vals = np.linspace(2.0, 30.0, 120)   # theta must exceed 1/(2P) ≈ 0.4; use ≥ 2
+θ_vals = np.linspace(2.0, 30.0, 120)   # restrict attention to a numerically stable range
 F_rob_vals, P_rob_vals = [], []
 
-for theta in theta_vals:
-    rblq = RBLQ(Q_mat, R_mat, A, B, C_fixed, beta, theta)
+for θ in θ_vals:
+    rblq = RBLQ(R_mat, Q_mat, A, B, C_fixed, β, θ)
     F_rob, K_rob, P_rob = rblq.robust_rule()
     F_rob_vals.append(float(F_rob[0, 0]))
     P_rob_vals.append(float(P_rob[0, 0]))
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-axes[0].plot(theta_vals, F_rob_vals, lw=2, label='Robust $F(\\theta)$')
-axes[0].axhline(F_standard, color='r', linestyle='--', lw=1.5,
+axes[0].plot(θ_vals, F_rob_vals, lw=2, label='Robust $F(\\theta)$')
+axes[0].axhline(F_standard, color='r', linestyle='--', lw=2,
                 label=f'Standard LQ ($F = {F_standard:.3f}$)')
-axes[0].set_xlabel('Robustness parameter $\\theta$')
-axes[0].set_ylabel('Policy gain $F$')
-axes[0].set_title('Robustness changes the policy')
+axes[0].set_xlabel('robustness parameter $\\theta$')
+axes[0].set_ylabel('policy gain $F$')
 axes[0].legend()
 
-axes[1].plot(theta_vals, P_rob_vals, lw=2, color='purple',
+axes[1].plot(θ_vals, P_rob_vals, lw=2, color='purple',
              label='Robust $P(\\theta)$')
-axes[1].axhline(P_standard, color='r', linestyle='--', lw=1.5,
+axes[1].axhline(P_standard, color='r', linestyle='--', lw=2,
                 label=f'Standard LQ ($P = {P_standard:.3f}$)')
-axes[1].set_xlabel('Robustness parameter $\\theta$')
-axes[1].set_ylabel('Value matrix $P$')
-axes[1].set_title('Robustness also changes the value matrix')
+axes[1].set_xlabel('robustness parameter $\\theta$')
+axes[1].set_ylabel('value matrix $P$')
 axes[1].legend()
 
 plt.tight_layout()
@@ -454,22 +361,22 @@ Observe that for small $\theta$ (strong preference for robustness) both $F$ and
 $P$ deviate substantially from their non-robust counterparts, converging to the
 standard values as $\theta \to \infty$.
 
-This contrasts sharply with ordinary CE: under robustness, **both the policy gain
-and the value matrix depend on the noise loadings** (through $\theta$ and $C$).
+This contrasts sharply with ordinary CE: under robustness, *both the policy gain
+and the value matrix depend on the robustness parameter $\theta$ and the
+noise-loading matrix $C$*.
 
----
 
-## Value Function Under Robustness
+## Value function under robustness
 
 Under a preference for robustness, the optimised value of {eq}`eq:multiplier` is again
 quadratic,
 
 ```{math}
 :label: eq:robust_value
-V(y_0) = -y_0' P\, y_0 - p,
+V(y_0) = -y_0^\top P\, y_0 - p,
 ```
 
-but now *both* $P$ **and** $p$ depend on the volatility parameter $f_2$.
+but now *both* $P$ *and* $p$ depend on the volatility parameter $f_2$.
 
 Specifically, $P$ is the fixed point of the composite operator $T \circ \mathcal{D}$
 where $T$ is the same Bellman operator as in the non-robust case and
@@ -489,22 +396,23 @@ p = p(P;\, f_2,\, \beta,\, \theta).
 
 Despite $P$ now depending on $f_2$, a form of CE still prevails: the same
 decision rule {eq}`eq:robust_ce_rule` also emerges from the *nonstochastic* game that
-maximises {eq}`eq:multiplier` subject to {eq}`eq:x_transition` and
+maximises {eq}`eq:multiplier` subject to {eq}`eq:x_transition_o` and
 
 ```{math}
 :label: eq:nonstoch_z
 z_{t+1} = f(z_t,\, w_{t+1}),
 ```
 
-i.e., setting $\epsilon_{t+1} \equiv 0$.  The presence of randomness lowers the
-value (the constant $p$) but does not change the decision rule.
+i.e., setting $\epsilon_{t+1} \equiv 0$.
 
----
+The presence of randomness lowers the value (the constant $p$) but does not change the decision rule.
 
-## Risk-Sensitive Preferences
 
-Building on Jacobson (1973) and Whittle (1990), Hansen and Sargent (1995) showed that
+## Risk-sensitive preferences
+
+Building on {cite:t}`Jacobson_73` and {cite:t}`Whittle_1990`, {cite:t}`hansen2004certainty` showed that
 the same decision rules can be reinterpreted through **risk-sensitive preferences**.
+
 Suppose the decision maker *fully trusts* his model
 
 ```{math}
@@ -529,35 +437,40 @@ where the *risk-adjusted* continuation operator is
 ```
 
 When $\sigma = 0$, L'Hôpital's rule recovers the standard expectation operator.
+
 When $\sigma < 0$, $\mathcal{R}_t$ penalises right-tail risk in the continuation
 utility $U_{t+1}$.
 
 For a candidate quadratic continuation value
-$U_{t+1}^e = -y_{t+1}' \Omega\, y_{t+1} - \rho$, evaluating $\mathcal{R}_t$
-via the log-moment-generating function of the Gaussian distribution yields
+$U_{t+1}^e = -y_{t+1}^\top \Omega\, y_{t+1} - \rho$, let
+$\hat{y}_{t+1} \equiv A y_t + B u_t$ denote the conditional mean of $y_{t+1}$.
+Evaluating $\mathcal{R}_t$ via the log-moment-generating function of the
+Gaussian distribution yields
 
 ```{math}
 :label: eq:rs_eval
 \mathcal{R}_t U_{t+1}^e
-    = -y_t' \hat{A}_t' \mathcal{D}(\Omega)\, \hat{A}_t\, y_t - \hat{\rho}
+    = -\hat{y}_{t+1}^\top \mathcal{D}(\Omega)\, \hat{y}_{t+1} - \hat{\rho}
 ```
 
-where $\mathcal{D}$ is the **same** distortion operator as in {eq}`eq:distortion_op`
-with $\theta = -\sigma^{-1}$.  Consequently, the risk-sensitive Bellman equation
+where $\mathcal{D}$ is the *same* distortion operator as in {eq}`eq:distortion_op`
+with $\theta = -\sigma^{-1}$, and $\hat{\rho}$ is the corresponding scalar
+adjustment term.
+
+Consequently, the risk-sensitive Bellman equation
 has the *same* fixed point $P$ as the robust control problem, and therefore the
-**same decision rule** $u_t = -F y_t$.
+*same decision rule* $u_t = -F y_t$.
 
 > **Key equivalence:**  robust control with parameter $\theta$ and risk-sensitive
 > control with parameter $\sigma = -\theta^{-1}$ produce identical decision rules.
 
----
 
-## Application: Permanent Income Model
+## Application: permanent income model
 
 We now illustrate all of the above in a concrete linear-quadratic permanent income
 model.
 
-### Model Setup
+### Model setup
 
 A consumer receives an exogenous endowment process $\{z_t\}$ and allocates it
 between consumption $c_t$ and savings $x_t$ to maximise
@@ -567,10 +480,10 @@ between consumption $c_t$ and savings $x_t$ to maximise
 -\mathbb{E}_0 \sum_{t=0}^{\infty} \beta^t (c_t - b)^2, \qquad \beta \in (0,1)
 ```
 
-where $b$ is a bliss level of consumption. 
+where $b$ is a bliss level of consumption.
 
-Defining the *marginal utility
-of consumption* $\mu_{ct} \equiv b - c_t$ (the control), the budget constraint
+Defining the **marginal utility
+of consumption** $\mu_{ct} \equiv b - c_t$ (the control), the budget constraint
 and endowment process are
 
 ```{math}
@@ -586,7 +499,9 @@ z_{t+1} = \mu_d(1-\rho) + \rho\, z_t + c_d(\epsilon_{t+1} + w_{t+1})
 where $R > 1$ is the gross return on savings, $|\rho| < 1$, and $w_{t+1}$
 is an optional shock-mean distortion representing model misspecification.
 
-Setting $w_{t+1} \equiv 0$ and taking $Q = 0$ (return depends only on the
+After absorbing the constants $-b$ and $\mu_d(1-\rho)$ by augmenting the state
+vector, or equivalently by working with deviations from steady state, setting
+$w_{t+1} \equiv 0$ and taking $Q = 0$ (return depends only on the
 control $\mu_{ct}$) and $R_{\text{ctrl}} = 1$ puts this in the standard LQ form
 
 ```{math}
@@ -600,37 +515,32 @@ B = \begin{bmatrix} 1 \\ 0 \end{bmatrix},
 C = \begin{bmatrix} 0 \\ c_d \end{bmatrix}.
 ```
 
-We calibrate to parameters estimated by Hansen, Sargent, and Tallarini (1999) {cite}`HST_1999`
+In the numerical code below we add a negligible `1e-8 I` regularisation to the
+state-cost matrix to keep the Riccati computation well conditioned in Hall's
+unit-root case $\beta R = 1$.
+
+We calibrate to parameters estimated by {cite:t}`HST_1999`
 from post-WWII U.S. data:
 
 ```{code-cell} ipython3
-# ── HST calibration ─────────────────────────────────────────────────────────
-beta_hat = 0.9971
-R_rate   = 1.0 / beta_hat   # so that β·R = 1  (Hall's case)
-rho      = 0.9992
-c_d      = 5.5819
-sigma_rs = -2e-7             # risk-sensitivity / robustness parameter σ̂ < 0
-theta_pi = -1.0 / sigma_rs  # robustness parameter θ = −1/σ̂ = 5×10⁶
-
-# LQ matrices (state = [x_t, z_t], control = μ_ct = b − c_t)
+β_hat = 0.9971
+R_rate = 1.0 / β_hat          # β*R = 1 (Hall's case)
+ρ     = 0.9992
+c_d   = 5.5819
+σ_rs  = -2e-7                  # σ_hat < 0
+θ_pi  = -1.0 / σ_rs           # θ = -1/σ_hat
+
 A_pi = np.array([[R_rate, 1.0],
-                 [0.0,    rho]])
+                 [0.0,    ρ]])
 B_pi = np.array([[1.0],
                  [0.0]])
 C_pi = np.array([[0.0],
                  [c_d]])
-# Return = −μ_ct²: no state penalty, unit control penalty.
-# A tiny regulariser is added to Q to make the Riccati numerically
-# well-conditioned when β·R = 1 (Hall's unit-root case).
-Q_pi = 1e-8 * np.eye(2)   # economically negligible regularisation
+Q_pi = 1e-8 * np.eye(2)       # regularise for β*R = 1
 R_pi = np.array([[1.0]])
-
-print("A ="); print(A_pi)
-print("B ="); print(B_pi)
-print("C ="); print(C_pi)
 ```
 
-### Without Robustness: Hall's Martingale
+### Without robustness: Hall's martingale
 
 Setting $\sigma = 0$ (no preference for robustness), the consumer's Euler
 equation is
@@ -645,7 +555,9 @@ $\mathbb{E}_t[\mu_{c,t+1}] = \mu_{ct}$, i.e., the **marginal utility of
 consumption is a martingale** — equivalently, consumption follows a random walk.
 
 The optimal policy is $\mu_{ct} = -F y_t$ where, from the solved-forward
-Euler equation, $F = [(R-1),\ (R-1)/(R - \rho)]$.  The resulting closed-loop
+Euler equation, $F = [(R-1),\ (R-1)/(R - \rho)]$.
+
+The resulting closed-loop
 projection onto the one-dimensional direction of $\mu_{ct}$ gives the scalar
 AR(1) representation
 
@@ -655,29 +567,22 @@ AR(1) representation
 ```
 
 ```{code-cell} ipython3
-# ── Standard consumer: analytical Euler equation (Hall's βR = 1) ─────────────
-# Optimal policy from permanent income theory (solved-forward Euler equation):
-#   μ_ct = −(R−1)·x_t − (R−1)/(R−ρ)·z_t
-F_pi    = np.array([[(R_rate - 1.0), (R_rate - 1.0) / (R_rate - rho)]])
+F_pi     = np.array([[(R_rate - 1.0), (R_rate - 1.0) / (R_rate - ρ)]])
 A_cl_std = A_pi - B_pi @ F_pi
 
-# AR(1) law of motion for μ_c = −F·y under the optimal policy:
-#   φ_std = 1/(βR) = 1  (Hall's martingale, βR = 1)
-#   ν_std = (R−1)·c_d / (R − ρ)   (permanent income innovation formula)
-phi_std = 1.0 / (beta_hat * R_rate)   # = 1.0 exactly when βR = 1
-nu_std  = (R_rate - 1.0) * c_d / (R_rate - rho)
+φ_std = 1.0 / (β_hat * R_rate)
+ν_std = (R_rate - 1.0) * c_d / (R_rate - ρ)
 
-print(f"Standard consumer (Hall's βR = 1):")
-print(f"  Policy F = {F_pi}")
-print(f"  AR(1) coeff  φ = {phi_std:.6f}  (= 1, martingale)")
-print(f"  Innov. scale ν = {nu_std:.4f}  (paper reports ≈ 4.3825)")
+print(f"φ = {φ_std:.6f}, ν = {ν_std:.4f}")
 ```
 
-### With Robustness: Precautionary Savings
+### With robustness: precautionary savings
 
 Under a preference for robustness ($\sigma < 0$, $\theta < \infty$), the consumer
 uses distorted forecasts $\hat{\mathbb{E}}_t[\cdot]$ evaluated under the
-worst-case model.  The consumption rule takes the certainty-equivalent form
+worst-case model.
+
+The consumption rule takes the certainty-equivalent form
 
 ```{math}
 :label: eq:robust_consumption
@@ -686,8 +591,10 @@ worst-case model.  The consumption rule takes the certainty-equivalent form
         \sum_{j=0}^{\infty} R^{-j}(z_{t+j} - b)\right]\right)
 ```
 
-where $h_1$ — the first step of the CE algorithm — is **identical** to the
-non-robust case.  Only the expectations operator changes.
+where $h_1$ — the first step of the CE algorithm — is *identical* to the
+non-robust case.
+
+Only the expectations operator changes.
 
 The resulting AR(1) dynamics for $\mu_{ct}$ become:
 
@@ -705,73 +612,70 @@ where $\tilde{\beta} = \tilde{\beta}(\sigma)$.
 
 The innovation scale $\tilde{\nu}$
 follows from the robust permanent income formula with the distorted persistence;
-{cite}`HST_1999` report $\tilde{\nu} \approx 8.0473$ for their
+{cite:t}`HST_1999` report $\tilde{\nu} \approx 8.0473$ for their
 calibration.
 
 ```{code-cell} ipython3
-# ── Robust consumer: use observational equivalence to get φ̃ analytically ─────
-def beta_tilde(sigma, beta_hat_val, alpha_sq_val):
-    """Observational-equivalence locus: β̃(σ) that matches robust (σ,β̂) consumption."""
-    denom = 2.0 * (1.0 + sigma * alpha_sq_val)
-    numer = beta_hat_val * (1.0 + beta_hat_val)
-    disc  = 1.0 - 4.0 * beta_hat_val * (1.0 + sigma * alpha_sq_val) / \
-            (1.0 + beta_hat_val) ** 2
+def beta_tilde(σ, β_hat_val, α_sq_val):
+    """Observational-equivalence locus: β_tilde(σ)."""
+    denom = 2.0 * (1.0 + σ * α_sq_val)
+    numer = β_hat_val * (1.0 + β_hat_val)
+    disc  = 1.0 - 4.0 * β_hat_val * (1.0 + σ * α_sq_val) / \
+            (1.0 + β_hat_val) ** 2
     return (numer / denom) * (1.0 + np.sqrt(np.maximum(disc, 0.0)))
 
-alpha_sq = nu_std ** 2          # α² = ν² (squared innovation loading)
-bt       = beta_tilde(sigma_rs, beta_hat, alpha_sq)
-phi_rob  = 1.0 / (bt * R_rate)  # φ̃ = 1/(β̃R) < 1  (mean-reverting!)
-nu_rob   = 8.0473               # from HST (1999) via Hansen–Sargent (2001)
+ν_rob = 8.0473
+α_sq  = ν_rob ** 2
+bt    = beta_tilde(σ_rs, β_hat, α_sq)
+φ_rob = 1.0 / (bt * R_rate)
 
-print(f"Robust consumer (σ = {sigma_rs}):")
-print(f"  Equiv. discount factor  β̃ = {bt:.5f}  (paper: ≈ 0.9995)")
-print(f"  AR(1) coeff  φ̃ = {phi_rob:.4f}  (paper: ≈ 0.9976 → mean-reverting)")
-print(f"  Innov. scale ν̃ = {nu_rob:.4f}  (paper: ≈ 8.0473)")
+print(f"β_tilde = {bt:.5f}, φ_tilde = {φ_rob:.4f}, ν_tilde = {ν_rob:.4f}")
 ```
 
 ```{code-cell} ipython3
-# ── Simulate and compare: standard vs robust consumption paths ────────────────
+---
+mystnb:
+  figure:
+    caption: Standard vs robust consumption paths
+    name: fig-std-vs-robust-paths
+---
 np.random.seed(42)
 T_sim = 100
 
-def simulate_ar1(phi, nu, T, mu0=0.0):
-    """Simulate μ_{c,t} from AR(1): μ_{t+1} = φ·μ_t + ν·ε_{t+1}."""
-    path = np.empty(T)
+def simulate_ar1(φ, ν, shocks, mu0=0.0):
+    path = np.empty(len(shocks) + 1)
     path[0] = mu0
-    for t in range(1, T):
-        path[t] = phi * path[t-1] + nu * np.random.randn()
+    for t, ε in enumerate(shocks, start=1):
+        path[t] = φ * path[t-1] + ν * ε
     return path
 
-# Initialise at a value away from zero to illustrate drift / mean-reversion
+shock_path = np.random.randn(T_sim - 1)
 mu0_init = 10.0
-mu_std_path = simulate_ar1(phi_std, nu_std, T_sim, mu0=mu0_init)
-mu_rob_path = simulate_ar1(phi_rob, nu_rob, T_sim, mu0=mu0_init)
+mu_std_path = simulate_ar1(φ_std, ν_std, shock_path, mu0=mu0_init)
+mu_rob_path = simulate_ar1(φ_rob, ν_rob, shock_path, mu0=mu0_init)
 
 fig, axes = plt.subplots(2, 1, figsize=(11, 6), sharex=True)
 t_grid = np.arange(T_sim)
 
-axes[0].plot(t_grid, mu_std_path, lw=1.8, label=f'$\\mu_{{ct}}$ (standard, $\\varphi={phi_std:.4f}$)')
+axes[0].plot(t_grid, mu_std_path, lw=2, label=f'$\\mu_{{ct}}$ (standard, $\\varphi={φ_std:.4f}$)')
 axes[0].axhline(0, color='k', lw=0.8, linestyle='--')
 axes[0].set_ylabel('$\\mu_{ct}$')
-axes[0].set_title('Standard consumer: random walk ($\\varphi = 1$, no mean-reversion)')
 axes[0].legend(loc='upper right')
 
-axes[1].plot(t_grid, mu_rob_path, lw=1.8, color='darkorange',
-             label=f'$\\mu_{{ct}}$ (robust, $\\tilde{{\\varphi}}={phi_rob:.4f}$)')
+axes[1].plot(t_grid, mu_rob_path, lw=2, color='darkorange',
+             label=f'$\\mu_{{ct}}$ (robust, $\\tilde{{\\varphi}}={φ_rob:.4f}$)')
 axes[1].axhline(0, color='k', lw=0.8, linestyle='--')
-axes[1].set_xlabel('Period $t$')
+axes[1].set_xlabel('period $t$')
 axes[1].set_ylabel('$\\mu_{ct}$')
-axes[1].set_title(
-    f'Robust consumer: mean-reverting ($\\tilde{{\\varphi}} < 1$) → precautionary saving')
 axes[1].legend(loc='upper right')
 
 plt.tight_layout()
 plt.show()
 ```
 
-### Observational Equivalence: Robustness Acts Like Patience
+### Observational equivalence: robustness acts like patience
 
-A key insight of {cite}`HansenSargent2001` is that, in the permanent income model,
+A key insight of {cite:t}`HansenSargent2001` is that, in the permanent income model,
 a preference for robustness ($\sigma < 0$) is *observationally equivalent* to an
 increase in the discount factor from $\hat{\beta}$ to a larger value
 $\tilde{\beta}(\sigma)$, with $\sigma$ set back to zero.
@@ -785,43 +689,45 @@ The equivalence locus is given by
     \left[1 + \sqrt{1 - \frac{4\hat{\beta}(1+\sigma\alpha^2)}{(1+\hat{\beta})^2}}\right]
 ```
 
-where $\alpha^2 = \nu^2$ is the squared innovation loading on $\mu_{ct}$ computed
-from the standard ($\sigma = 0$) problem.
+where $\alpha^2 = \tilde{\nu}^2$ is the squared innovation loading in the
+robust AR(1) representation {eq}`eq:robust_ar1`.
 
 ```{code-cell} ipython3
-# ── Observational-equivalence locus plot ─────────────────────────────────────
-sigma_range = np.linspace(-3e-7, 0.0, 200)
-bt_vals     = [beta_tilde(s, beta_hat, alpha_sq) for s in sigma_range]
-bt_check    = beta_tilde(sigma_rs, beta_hat, alpha_sq)
+---
+mystnb:
+  figure:
+    caption: Observational equivalence locus
+    name: fig-oe-locus
+---
+σ_range = np.linspace(-3e-7, 0.0, 200)
+bt_vals = [beta_tilde(s, β_hat, α_sq) for s in σ_range]
+bt_check = beta_tilde(σ_rs, β_hat, α_sq)
 
 fig, ax = plt.subplots(figsize=(9, 5))
-ax.plot(-sigma_range * 1e7, bt_vals, lw=2, color='steelblue',
+ax.plot(-σ_range * 1e7, bt_vals, lw=2, color='steelblue',
         label='$\\tilde{\\beta}(\\sigma)$')
-ax.axhline(beta_hat, color='r', linestyle='--', lw=1.5,
-           label=f'$\\hat{{\\beta}} = {beta_hat}$')
-ax.scatter([-sigma_rs * 1e7], [bt_check], zorder=5, color='darkorange', s=80,
+ax.axhline(β_hat, color='r', linestyle='--', lw=2,
+           label=f'$\\hat{{\\beta}} = {β_hat}$')
+ax.scatter([-σ_rs * 1e7], [bt_check], zorder=5, color='darkorange', s=80,
            label=f'$(\\hat{{\\sigma}},\\, \\tilde{{\\beta}}) '
-                 f'= ({sigma_rs:.0e},\\, {bt_check:.4f})$')
-ax.set_xlabel('Risk sensitivity $-\\sigma$ (×$10^{-7}$)')
-ax.set_ylabel('Observationally equivalent discount factor $\\tilde{\\beta}$')
-ax.set_title('Robustness acts like increased patience in permanent income model')
+                 f'= ({σ_rs:.0e},\\, {bt_check:.4f})$')
+ax.set_xlabel('risk sensitivity $-\\sigma$ ($\\times 10^{-7}$)')
+ax.set_ylabel('observationally equivalent discount factor $\\tilde{\\beta}$')
 ax.legend()
 plt.tight_layout()
 plt.show()
-print(f"β̃(σ̂ = {sigma_rs}) = {bt_check:.5f}  (paper reports ≈ 0.9995) ✓")
 ```
 
-The plot confirms the paper's key finding: **activating a preference for
+The plot confirms the paper's key finding: *activating a preference for
 robustness is observationally equivalent — for consumption and saving behaviour
-— to increasing the discount factor**. 
+— to increasing the discount factor*.
 
-However, {cite}`HST_1999` show that  the two
-parametrisations do **not** imply the same asset prices.
+However, {cite:t}`HST_1999` show that the two
+parametrisations do *not* imply the same asset prices.
 
-* this happens because the  model in which the representative agent distrusts his model  generates different state-prices through the
+This happens because a preference for robustness generates different state-prices through the
 $\mathcal{D}(P)$ matrix that enters the stochastic discount factor.
 
----
 
 ## Summary
 
@@ -835,73 +741,17 @@ The table below condenses the main results:
 
 In all three cases, the decision maker can be described as following a
 two-step procedure: first solve a nonstochastic control problem, then form
-beliefs.  
+beliefs.
 
 The difference is in which beliefs are formed in the second step.
 
----
 
 ## Exercises
 
-```{exercise-start}
-:label: ce_ex1
-```
-
-**CE and noise variance.**
-
-Using the scalar LQ setup in the first code cell (with $a = 0.9$, $b = 1$,
-$q = r = 1$, $\beta = 0.95$), verify numerically that the value constant $d$
-satisfies $d \propto \sigma^2$ for large $\sigma$.
-
-*Hint:* From the CE analysis, $p = \tfrac{\beta}{1-\beta}\,\mathrm{tr}(C' P C)$
-and $C = \sigma$ in the scalar case, so $p = \tfrac{\beta}{1-\beta} P\, \sigma^2$.
-Confirm that a plot of $d$ against $\sigma^2$ is linear.
-
-```{exercise-end}
-```
-
-```{solution-start} ce_ex1
-:class: dropdown
-```
-
-```{code-cell} ipython3
-# Reuse F_vals and d_vals already computed above
-sigma_sq_vals = sigma_vals ** 2
-
-fig, ax = plt.subplots(figsize=(8, 5))
-ax.plot(sigma_sq_vals, d_vals, lw=2)
-ax.set_xlabel('$\\sigma^2$')
-ax.set_ylabel('Value constant $d$')
-ax.set_title('Value constant is linear in noise variance (CE principle)')
-
-# Overlay linear fit
-coeffs = np.polyfit(sigma_sq_vals, d_vals, 1)
-ax.plot(sigma_sq_vals, np.polyval(coeffs, sigma_sq_vals),
-        'r--', lw=1.5, label=f'Linear fit: slope = {coeffs[0]:.3f}')
-ax.legend()
-plt.tight_layout()
-plt.show()
-
-# Theoretical slope: β/(1−β) × P
-P_scalar = float(LQ(Q_mat, R_mat, A, B, C=np.zeros((1, 1)),
-                    beta=beta).stationary_values()[0])
-theoretical_slope = beta / (1 - beta) * P_scalar
-print(f"Empirical slope:    {coeffs[0]:.4f}")
-print(f"Theoretical slope β/(1−β)·P = {theoretical_slope:.4f}")
-```
-
-The slope is indeed $\tfrac{\beta}{1-\beta} P \approx 19 \times P$, confirming the
-analytic formula.
-
-```{solution-end}
-```
-
 ```{exercise-start}
 :label: ce_ex2
 ```
 
-**Convergence of robust policy to standard policy.**
-
 Show numerically that as $\theta \to \infty$ the robust policy $F(\theta)$ converges
 to the standard LQ policy $F_{\text{std}}$ and that the rate of convergence is of
 order $1/\theta$.  Plot $|F(\theta) - F_{\text{std}}|$ against $1/\theta$ on a
@@ -915,24 +765,23 @@ log–log scale.
 ```
 
 ```{code-cell} ipython3
-theta_large = np.logspace(0.5, 3.0, 100)   # θ from ~3 to 1000 (must exceed criticality)
-gap_vals    = []
+θ_large = np.logspace(0.5, 3.0, 100)
+gap_vals = []
 
-for theta in theta_large:
-    rblq = RBLQ(Q_mat, R_mat, A, B, C_fixed, beta, theta)
+for θ in θ_large:
+    rblq = RBLQ(R_mat, Q_mat, A, B, C_fixed, β, θ)
     F_r, _, _ = rblq.robust_rule()
     gap_vals.append(abs(float(F_r[0, 0]) - F_standard))
 
 fig, ax = plt.subplots(figsize=(8, 5))
-ax.loglog(1.0 / theta_large, gap_vals, lw=2)
+ax.loglog(1.0 / θ_large, gap_vals, lw=2)
 ax.set_xlabel('$1/\\theta$')
 ax.set_ylabel('$|F(\\theta) - F_{\\mathrm{std}}|$')
 ax.set_title('Robust policy converges to standard LQ at rate $1/\\theta$')
 
-# Overlay slope-1 reference line
-x_ref = 1.0 / theta_large
+x_ref = 1.0 / θ_large
 ax.loglog(x_ref, x_ref * gap_vals[0] / x_ref[0],
-          'r--', lw=1.5, label='Slope 1 reference')
+          'r--', lw=2, label='Slope 1 reference')
 ax.legend()
 plt.tight_layout()
 plt.show()
@@ -948,13 +797,17 @@ convergence.
 :label: ce_ex3
 ```
 
-**Observational equivalence verification.**
+Pick three values $\sigma_i < 0$ and verify numerically that the robust
+permanent income model with $(\sigma_i, \hat{\beta})$ produces the same
+policy matrix $F$ as a suitably chosen non-robust model with
+$(0, \tilde{\beta}_i)$.
+
+To find $\tilde{\beta}_i$, extract the AR(1) coefficient $\varphi_i$ for
+$\mu_{ct}$ from the robust closed-loop dynamics and set
+$\tilde{\beta}_i = 1/(\varphi_i R)$.
 
-Choose three pairs $(\sigma_i, \beta_i)$ on the observational equivalence locus
-{eq}`eq:oe_locus` (i.e., set $\sigma_i < 0$ and compute the matching $\tilde{\beta}_i$).
-For each pair, solve the corresponding LQ problem and verify that the AR(1)
-coefficient $\varphi$ for $\mu_{ct}$ is the same across all three pairs (to
-numerical precision), while the $P$ matrices differ.
+Show that $\tilde{\beta}_i > \hat{\beta}$ in every case, confirming that
+robustness acts like increased patience.
 
 ```{exercise-end}
 ```
@@ -963,36 +816,57 @@ numerical precision), while the $P$ matrices differ.
 :class: dropdown
 ```
 
+For each $\sigma_i$ we solve the robust problem with `RBLQ` and extract the
+AR(1) coefficient $\varphi$ for $\mu_{ct}$ from the closed-loop dynamics
+$A_{\text{cl}} = A - B F_{\text{rob}}$.
+
+If $F$ is a left eigenvector of $A_{\text{cl}}$ with eigenvalue $\varphi$,
+then $\mu_{ct} = -F y_t$ satisfies
+$\mu_{c,t+1} = \varphi\, \mu_{ct} + \nu\, \epsilon_{t+1}$.
+
+Setting $\tilde{\beta} = 1/(\varphi R)$ and solving a standard (non-robust)
+LQ problem with discount factor $\tilde{\beta}$ should reproduce $F$.
+
 ```{code-cell} ipython3
-# Three σ values and their observationally-equivalent βs
-sigma_trio = np.array([-1e-7, -2e-7, -3e-7])
-beta_trio  = np.array([beta_tilde(s, beta_hat, alpha_sq) for s in sigma_trio])
-
-print("Observationally equivalent (σ, β̃) pairs:")
-for s, b in zip(sigma_trio, beta_trio):
-    print(f"  σ = {s:.1e}  →  β̃ = {b:.6f}")
-
-# By the OE formula, φ_robust(σ) = 1/(β̃(σ)·R) and
-# φ_standard(β̃)  = 1/(β̃·R)  — so they must be equal by construction.
-# The key additional point from the paper: P matrices differ even though φ matches.
-print("\nAR(1) coefficient φ for each (σ, β̃) pair:")
-for s, b in zip(sigma_trio, beta_trio):
-    phi_r = 1.0 / (b * R_rate)   # robust:   φ = 1/(β̃R)
-    phi_s = 1.0 / (b * R_rate)   # standard with β̃: same formula by OE
-    print(f"  σ = {s:.1e}, β̃ = {b:.6f}:  φ_robust = φ_standard = {phi_r:.6f}  ✓")
-
-print("\nNote: although φ is the same, the P matrices (and hence asset prices)")
-print("differ between the (σ, β̂) and (0, β̃) specifications. This is the")
-print("key distinguishing implication for risk premia in Hansen-Sargent-Tallarini.")
-```
-
-The AR(1) coefficients $\varphi$ are identical across the two representations
-in each pair by construction of the observational equivalence formula — the
-equivalence holds for consumption and saving *quantities*.  However, the
-$\mathcal{D}(P)$ matrices differ across $(\hat\sigma, \hat\beta)$ and
-$(0, \tilde\beta)$ pairs; it is this matrix that encodes the stochastic discount
-factor used in asset pricing.  Thus, although saving plans look the same, equity
-premia differ.
+σ_trio = np.array([-5e-8, -1e-7, -2e-7])
+
+for s in σ_trio:
+    # Robust model: (σ, β_hat)
+    θ_val = -1.0 / s
+    rblq = RBLQ(R_pi, Q_pi, A_pi, B_pi, C_pi, β_hat, θ_val)
+    F_rob, K_rob, P_rob = rblq.robust_rule()
+
+    # Extract φ from closed-loop under the approximating model
+    A_cl = A_pi - B_pi @ F_rob
+    φ_rob = float((F_rob @ A_cl)[0, 1] / F_rob[0, 1])
+
+    # Implied discount factor
+    bt = 1.0 / (φ_rob * R_rate)
+
+    # Non-robust model with β_tilde
+    lq_nr = LQ(R_pi, Q_pi, A_pi, B_pi, C=C_pi, beta=bt)
+    P_nr, F_nr, d_nr = lq_nr.stationary_values()
+
+    print(f"σ = {s:.1e},  θ = {θ_val:.1e},  β̃ = {bt:.6f} (> β̂ = {β_hat})")
+    print(f"  φ_rob = {φ_rob:.8f}")
+    print(f"  F_robust  = [{F_rob[0,0]:.6f}, {F_rob[0,1]:.6f}]")
+    print(f"  F_non-rob = [{F_nr[0,0]:.6f}, {F_nr[0,1]:.6f}]")
+    print(f"  |F_rob - F_nr| = {np.max(np.abs(F_rob - F_nr)):.2e}")
+    print(f"  K (worst-case distortion): [{K_rob[0,0]:.2e}, {K_rob[0,1]:.2e}]")
+    print()
+```
+
+The policy matrices $F$ match to high precision, confirming observational
+equivalence for consumption and saving decisions.
+
+In every case $\tilde{\beta} > \hat{\beta}$: a preference for robustness
+makes the agent behave as if he were more patient.
+
+The non-zero worst-case distortion $K$ in the robust model has no analogue in
+the non-robust model.
+
+As {cite:t}`HST_1999` show, this is why the two parametrisations imply
+different asset prices even though saving plans coincide.
 
 ```{solution-end}
 ```

From d844203e4fc0346b999f39572e99b111cadf8379 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 17 Mar 2026 15:07:08 +1100
Subject: [PATCH 11/12] minor updates

---
 lectures/theil_2.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lectures/theil_2.md b/lectures/theil_2.md
index 9e8f267c3..e73722a51 100644
--- a/lectures/theil_2.md
+++ b/lectures/theil_2.md
@@ -94,7 +94,9 @@ from quantecon import LQ, RBLQ
 ## Recap: ordinary certainty equivalence
 
 The {ref}`companion lecture <certainty_equiv_theil1>` established the CE
-property in detail.  Here we collect only the elements needed for the
+property in detail.  
+
+Here we collect only the elements needed for the
 robustness extension below.
 
 The state vector $y_t = \begin{bmatrix} x_t \\ z_t \end{bmatrix}$ has an

From ebf71a68bdd13bd7a3a9cebd77cfcd0fa846a729 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 17 Mar 2026 15:09:17 +1100
Subject: [PATCH 12/12] update

---
 lectures/theil_2.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lectures/theil_2.md b/lectures/theil_2.md
index e73722a51..cd844df39 100644
--- a/lectures/theil_2.md
+++ b/lectures/theil_2.md
@@ -446,6 +446,7 @@ utility $U_{t+1}$.
 For a candidate quadratic continuation value
 $U_{t+1}^e = -y_{t+1}^\top \Omega\, y_{t+1} - \rho$, let
 $\hat{y}_{t+1} \equiv A y_t + B u_t$ denote the conditional mean of $y_{t+1}$.
+
 Evaluating $\mathcal{R}_t$ via the log-moment-generating function of the
 Gaussian distribution yields