Enforce mode analysis: current limits too generous, tuning recommendations

## Summary

Analysis of enforce mode effectiveness based on 18 days of query_guard observe-mode data (May 30 – Jun 16, 2026) from binoidcbd.com production. With current limits, enforce mode would have killed **2 out of 3,499 slow queries** on the worst day (Jun 8) and saved **23 seconds out of 32,218 seconds** of slow query DB time (0.07%). The limits are too generous for the actual query duration distribution. Tuning recommendations below would increase effectiveness to an estimated 2,000-3,000s/day saved.

Prepared by Claude Code.

---

## What Enforce Mode Does

When switched from `observe` to `enforce`, query_guard sets `SET SESSION MAX_EXECUTION_TIME` on every MySQL connection, with per-context ceilings:

| Context | Limit | What runs here |
|---|---|---|
| `wp_cli` | Unlimited | CLI operations |
| `action_scheduler` | Unlimited | FB sync, subscriptions, background jobs |
| `wp_cron` | 10s | Cron tasks |
| `admin_ajax` | 10s | CS agent searches (AJAX), order note loading |
| `checkout` | 60s | Customer checkout flow |
| `rest_api` | 30s | Goaffpro, WooCommerce REST API |
| `wp_admin` | 45s | Admin page loads (non-AJAX) |
| `frontend` | 30s | Customer-facing page loads |

When a query exceeds its context's limit, MySQL kills it (error 3024). For admin searches specifically, the plugin shows a "Search timed out" notice instead of "no results found," breaking the retry loop where CS agents re-search and re-trigger the same slow query.

## Why Current Limits Don't Help

The damage is not from individual extreme queries. It is from **volume** — thousands of 5-10 second queries running concurrently and saturating the MySQL thread pool. The enforce-mode limits are set high enough that almost nothing gets killed.

### Jun 8 (Peak Crisis — 3,499 Slow Queries, 32,218s Total DB Time)

| Context | Slow queries | DB time | Would be killed | DB time saved |
|---|---|---|---|---|
| `rest_api` | 3,055 | 26,361s | 1 (at 36s) | 6s |
| `wp_admin` | 205 | 4,497s | 1 (at 62s) | 17s |
| `admin_ajax` | 223 | 1,262s | 0 | 0s |
| `action_scheduler` | 16 | 98s | 0 (unlimited) | 0s |
| **Total** | **3,499** | **32,218s** | **2 (0.06%)** | **23s (0.07%)** |

### Jun 14 (Post-Crisis Baseline — 367 Slow Queries)

Zero queries exceeded any enforce-mode limit.

### All 18 Days (May 30 – Jun 16)

| Date | Slow queries | Would be killed | Breakdown |
|---|---|---|---|
| May 30 | 299 | 8 | ajax=2, cron=1, admin=2, rest=2, front=1 |
| May 31 | 243 | 0 | — |
| Jun 1 | 296 | 1 | ajax=1 |
| Jun 2 | 347 | 0 | — |
| Jun 3 | 417 | 0 | — |
| Jun 4 | 441 | 0 | — |
| Jun 5 | 358 | 0 | — |
| Jun 6 | 217 | 0 | — |
| Jun 7 | 492 | 0 | — |
| Jun 8 | 3,499 | 2 | admin=1, rest=1 |
| Jun 9 | 2,828 | 25 | admin=20, rest=5 |
| Jun 10 | 2,885 | 19 | admin=16, rest=3 |
| Jun 11 | 3,216 | 20 | ajax=2, admin=17, rest=1 |
| Jun 12 | 2,840 | 23 | admin=21, rest=2 |
| Jun 13 | 2,794 | 7 | ajax=3, admin=1, rest=3 |
| Jun 14 | 367 | 0 | — |
| Jun 15 | 626 | 2 | ajax=1, admin=1 |
| Jun 16 | 196 | 12 | admin=1, rest=10, front=1 |

Across all 18 days, ~97 queries would have been killed out of ~21,000 total slow queries (~0.5%).

## Why the Limits Miss

The actual query durations sit comfortably under their context ceilings:

- **admin_ajax** (10s limit): Queries average 5.6s, max ~9.5s on a normal day. Nothing gets killed.
- **REST API** (30s limit): Goaffpro queries average 8.6s, max ~10s normally. Nothing gets killed.
- **wp_admin** (45s limit): CS admin searches average 17-22s, max 40s normally. Nearly all pass through — and these consume the most DB time (3,334s/day at baseline).
- **action_scheduler** (unlimited): FB sync queries (up to 50s) and subscription queries (up to 117s) are never constrained.

## Risk Assessment

**Safe to turn on.** Specific risks evaluated:

| Concern | Status |
|---|---|
| Customer checkout | Protected — 60s ceiling, detected before admin_ajax |
| WP-CLI operations | Unlimited — no impact |
| Action Scheduler jobs | Unlimited — no impact |
| WPE connection rotation | Handled — plugin re-applies timeout on connection change |
| Legitimate long admin operations | Low risk — bulk exports/edits rarely exceed 45s |
| Per-request overhead | Negligible — one `SET SESSION` query per request |

**What it does provide:**
- Safety net against truly runaway queries (60s+ admin, 30s+ REST)
- Admin search "timed out" notice for the handful of 45s+ searches (breaks CS retry loop on those specific cases)
- `query_killed` event logging for visibility

## Root Cause Comparison

| Fix | Estimated daily savings | Enforce mode equivalent |
|---|---|---|
| FB sync cleanup (396K orphaned rows) | ~715s/day | Enforce saves 0s (AS is unlimited) |
| Admin postmeta search fix | ~3,300s/day | Enforce saves ~0-17s (most under 45s) |
| Goaffpro REST volume (already resolved) | ~26,000s/day at peak | Enforce saved 6s on worst day |
| **Enforce mode (all contexts)** | **~0-23s/day** | — |

## Tuning Recommendations

The current limits were chosen conservatively. Based on 18 days of observed query durations, the following changes would make enforce mode substantially more effective without breaking legitimate operations.

### 1. Lower `wp_admin` from 45s to 20s

The wp_admin context is dominated by CS admin postmeta `LIKE '%term%'` searches. On a baseline day, these average 17s with a median of 18s. A 20s ceiling would:
- Kill the top ~45% of admin searches that currently run 20-40s
- Save an estimated ~1,500-2,000s of DB time per day at baseline
- Still allow the majority of admin page loads (non-search pages complete in <5s)
- The "Search timed out" notice already handles the UX — CS agents see a clear message instead of blank results

Risk: legitimate admin operations (WooCommerce exports, bulk edits) that exceed 20s would be killed. These are rare in the logs but should be monitored after the change. Specific operations can be exempted via the `hypercart_query_guard_limit_ms` filter.

### 2. Lower `rest_api` from 30s to 15s

REST API slow queries average 7-9s. A 15s ceiling would:
- Catch any future Goaffpro-style REST abuse much earlier (the Jun 8 crisis queries averaged 8.6s but would pile up to 36s under load — a 15s cap would have killed them once contention inflated durations)
- Still allow normal WooCommerce REST operations (product syncs, order lookups) which complete well under 15s
- The checkout endpoint is detected separately (60s ceiling), so customer transactions are unaffected

### 3. Cap `action_scheduler` at 30s instead of unlimited

The unlimited AS ceiling is the largest gap. FB sync queries run up to 50s, and subscription renewal queries have been observed at 117s. A 30s ceiling would:
- Kill the extreme tail of background jobs that hold threads during load spikes
- Still allow normal AS operations (most complete in 5-8s)
- The AS throttle system (batch_size, time_limit, deferrals) provides separate concurrency control, so a per-query ceiling adds defense-in-depth without conflicting

Risk: some legitimate long-running background jobs (large batch imports, subscription processing) could be killed. The AS runner retries failed actions, so killed jobs would be re-attempted on the next batch cycle. Monitor `query_killed` events with `context: action_scheduler` after enabling.

### 4. Consider a dedicated `admin_search` context

The current context detection routes all admin AJAX through `admin_ajax` (10s) and all admin page loads through `wp_admin` (45s). But admin searches on the orders/products list tables are the specific high-cost operation. A dedicated `admin_search` context (detected by the presence of `$_GET['s']` on admin list table pages) could apply a tighter ceiling (e.g., 15s) to searches specifically, while leaving other wp_admin operations at a more generous limit.

### Summary of Proposed Limits

| Context | Current | Proposed | Rationale |
|---|---|---|---|
| `wp_cli` | Unlimited | Unlimited | No change |
| `action_scheduler` | Unlimited | 30s | Cap extreme tail, AS retries handle failures |
| `wp_cron` | 10s | 10s | No change |
| `admin_ajax` | 10s | 10s | No change — already tight |
| `checkout` | 60s | 60s | No change — protects revenue |
| `rest_api` | 30s | 15s | Catches abuse earlier, normal ops well under 15s |
| `wp_admin` | 45s | 20s | Kills top ~45% of admin search scans |
| `frontend` | 30s | 30s | No change |

With these tighter limits, enforce mode would have killed an estimated ~100-150 queries/day at baseline (vs. ~0 currently) and ~500-800/day during the Jun 8-13 crisis (vs. ~16/day), saving an estimated 2,000-3,000s of DB time daily instead of 0-23s.

## Conclusion

Enforce mode with the current limits is a per-query duration guard that is too generous to catch the actual problem. Turning it on adds a reasonable safety net at negligible cost, but should not be expected to improve performance without tightening the limits.

With the proposed tuning above, enforce mode becomes a meaningful contributor — not a replacement for root-cause fixes (FB sync cleanup, admin search optimization), but a useful complement that limits the blast radius of the queries those fixes target.

## AS Throttle Mode: Detailed Analysis

The Action Scheduler throttle is a **separate system** from the main enforce mode, with its own mode setting (`as_throttle_mode`) and its own enforce behavior. It is currently running at `test_observe` (capability probe only — logs what it would do but takes no action).

### What AS Throttle Enforce Does

The AS throttle monitors two signals every time Action Scheduler runs a batch:

| Signal | Elevated threshold | Critical threshold | Exit threshold |
|---|---|---|---|
| `threads_running` (MySQL) | >= 5 | >= 15 | <= 3 (elevated), <= 10 (critical) |
| `queue_depth` (overdue AS actions) | >= 100 | >= 500 | <= 50 (elevated), <= 250 (critical) |

The higher of the two signals wins. When elevated or critical, enforce mode applies:

**Batch constraints (reduce concurrency):**

| Level | batch_size | time_limit | concurrent_batches |
|---|---|---|---|
| Normal | 25 (WC default) | 30s (WC default) | Unlimited |
| Elevated | 5 | 15s | 1 |
| Critical | 1 | 10s | 1 |

**Action deferrals (delay low-priority work):**

| Priority tier | Elevated delay | Critical delay | Example hooks |
|---|---|---|---|
| Critical | 0 (run immediately) | 0 (run immediately) | `nofraud_*`, `woocommerce_payment_*` |
| High | 0 (run immediately) | 300s (5 min) | `woocommerce_scheduled_subscription_*`, webhooks |
| Normal | 300s (5 min) | 900s (15 min) | `woocommerce_run_*`, `action_scheduler_*` |
| Deferrable | 900s (15 min) | 3600s (1 hr) | `facebook_for_woocommerce_*`, `shipstation_*`, `klaviyo_*` |

Recurring actions are never deferred (to avoid breaking recurrence chains). Each action can be deferred at most 5 times before forced execution.

### The Queue Pileup Problem

**This is the primary concern.** The data shows that AS throttle enforce would create a feedback loop that prevents recovery.

On Jun 8 (crisis day), the critical load level was triggered **2,602 times**. Of those:
- **2,585 were queue-driven** (queue_depth >= 500)
- **Only 1 was thread-driven** (threads_running >= 15)

This means the queue depth — not actual MySQL thread pressure — was almost entirely responsible for triggering critical mode. Once the queue backs up past 500, the following cycle occurs:

```
Queue hits 500 overdue actions
  → Critical level triggered
    → Batch size reduced to 1 action per batch
      → AS drains queue ~25x slower
        → Queue grows faster than it drains
          → Queue stays above 500
            → Critical level persists indefinitely
```

**Observed queue depths confirm this:**

| Day | Max queue depth | Max threads_running | % of AS runs at critical |
|---|---|---|---|
| Jun 1 (baseline) | 1,129 | 25 | 1.6% |
| Jun 7 (healthy) | 47 | 10 | 0% |
| Jun 8 (crisis) | 6,524 | 37 | 54% |
| Jun 9 | 3,277 | 12 | 27% |
| Jun 13 | 3,932 | 100 | 34% |
| Jun 14 (recovered) | 64 | 8 | 0% |
| Jun 15 | 1,436 | 17 | 5% |

The queue is **already piling up without throttling** — 6,524 overdue actions on Jun 8. Throttling the drain rate to 1 action/batch would make this dramatically worse. At that rate, the queue would take over 100 hours to drain even if no new actions were added.

The deferral system compounds this: deferred actions get rescheduled 5-60 minutes into the future. When load drops back to normal, all those deferred actions come due simultaneously, creating a secondary surge that can re-trigger elevated/critical and restart the cycle.

### Why Threads vs. Queue Matters

The two signals measure fundamentally different things:

- **threads_running >= 15** means the MySQL server is genuinely overloaded right now. Throttling AS is correct — stop adding work to a saturated system.
- **queue_depth >= 500** means AS has a backlog. Throttling makes the backlog worse. The queue is a *symptom* (actions take too long because the DB is slow), not the *cause*.

On baseline days (Jun 1), elevated was triggered by threads_running 391 times vs. queue_depth 63 times — a healthier ratio. On crisis days, queue_depth completely dominates the signal.

### Recommendation: Do Not Enable AS Throttle Enforce Yet

**Short answer:** With the current thresholds, AS throttle enforce would make crisis events worse by preventing queue drainage while the queue_depth signal keeps the system locked in critical mode.

**If AS throttle enforce is desired in the future**, the following changes would mitigate the feedback loop:

1. **Remove or raise the queue_depth thresholds.** Queue depth is a lagging indicator — by the time the queue hits 500, the damage is done and throttling can't help. Either disable queue_depth as a signal entirely, or raise the critical threshold to 2,000+ so it only fires during genuine runaway scenarios.

2. **Rely on threads_running as the primary signal.** This directly measures MySQL pressure and is the correct trigger for throttling. The existing thresholds (5 elevated, 15 critical) are reasonable.

3. **Soften the critical batch constraints.** batch_size=1 is too aggressive — it reduces throughput 25x. A critical batch_size of 5 (same as current elevated) with time_limit=10s would reduce throughput ~5x while still providing meaningful relief.

4. **Start with observe mode (not enforce).** Move from `test_observe` to `observe` first. Observe mode logs what it *would* do without actually throttling. This provides the data to validate whether the throttle decisions are correct before enabling enforce.

### Priority: Fix Root Causes First

The AS throttle is a compensating control for when the DB is overloaded. It cannot fix the underlying overload. The root-cause fixes (FB sync cleanup, postmeta search optimization) would reduce the average AS action execution time, which would:
- Prevent queue backup in the first place (actions complete faster → queue stays small)
- Reduce threads_running naturally (fewer long queries → fewer concurrent threads)
- Make the throttle unnecessary for most scenarios

Once the root causes are addressed, the throttle's value shifts from "crisis management" to "safety net" — and at that point, the threads_running signal alone (with queue_depth removed) would be sufficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enforce mode analysis: current limits too generous, tuning recommendations #42

Summary

What Enforce Mode Does

Why Current Limits Don't Help

Jun 8 (Peak Crisis — 3,499 Slow Queries, 32,218s Total DB Time)

Jun 14 (Post-Crisis Baseline — 367 Slow Queries)

All 18 Days (May 30 – Jun 16)

Why the Limits Miss

Risk Assessment

Root Cause Comparison

Tuning Recommendations

1. Lower `wp_admin` from 45s to 20s

2. Lower `rest_api` from 30s to 15s

3. Cap `action_scheduler` at 30s instead of unlimited

4. Consider a dedicated `admin_search` context

Summary of Proposed Limits

Conclusion

AS Throttle Mode: Detailed Analysis

What AS Throttle Enforce Does

The Queue Pileup Problem

Why Threads vs. Queue Matters

Recommendation: Do Not Enable AS Throttle Enforce Yet

Priority: Fix Root Causes First

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Context	Limit	What runs here
`wp_cli`	Unlimited	CLI operations
`action_scheduler`	Unlimited	FB sync, subscriptions, background jobs
`wp_cron`	10s	Cron tasks
`admin_ajax`	10s	CS agent searches (AJAX), order note loading
`checkout`	60s	Customer checkout flow
`rest_api`	30s	Goaffpro, WooCommerce REST API
`wp_admin`	45s	Admin page loads (non-AJAX)
`frontend`	30s	Customer-facing page loads

Context	Slow queries	DB time	Would be killed	DB time saved
`rest_api`	3,055	26,361s	1 (at 36s)	6s
`wp_admin`	205	4,497s	1 (at 62s)	17s
`admin_ajax`	223	1,262s	0	0s
`action_scheduler`	16	98s	0 (unlimited)	0s
Total	3,499	32,218s	2 (0.06%)	23s (0.07%)

Date	Slow queries	Would be killed	Breakdown
May 30	299	8	ajax=2, cron=1, admin=2, rest=2, front=1
May 31	243	0	—
Jun 1	296	1	ajax=1
Jun 2	347	0	—
Jun 3	417	0	—
Jun 4	441	0	—
Jun 5	358	0	—
Jun 6	217	0	—
Jun 7	492	0	—
Jun 8	3,499	2	admin=1, rest=1
Jun 9	2,828	25	admin=20, rest=5
Jun 10	2,885	19	admin=16, rest=3
Jun 11	3,216	20	ajax=2, admin=17, rest=1
Jun 12	2,840	23	admin=21, rest=2
Jun 13	2,794	7	ajax=3, admin=1, rest=3
Jun 14	367	0	—
Jun 15	626	2	ajax=1, admin=1
Jun 16	196	12	admin=1, rest=10, front=1

Concern	Status
Customer checkout	Protected — 60s ceiling, detected before admin_ajax
WP-CLI operations	Unlimited — no impact
Action Scheduler jobs	Unlimited — no impact
WPE connection rotation	Handled — plugin re-applies timeout on connection change
Legitimate long admin operations	Low risk — bulk exports/edits rarely exceed 45s
Per-request overhead	Negligible — one `SET SESSION` query per request

Fix	Estimated daily savings	Enforce mode equivalent
FB sync cleanup (396K orphaned rows)	~715s/day	Enforce saves 0s (AS is unlimited)
Admin postmeta search fix	~3,300s/day	Enforce saves ~0-17s (most under 45s)
Goaffpro REST volume (already resolved)	~26,000s/day at peak	Enforce saved 6s on worst day
Enforce mode (all contexts)	~0-23s/day	—

Signal	Elevated threshold	Critical threshold	Exit threshold
`threads_running` (MySQL)	>= 5	>= 15	<= 3 (elevated), <= 10 (critical)
`queue_depth` (overdue AS actions)	>= 100	>= 500	<= 50 (elevated), <= 250 (critical)

Level	batch_size	time_limit	concurrent_batches
Normal	25 (WC default)	30s (WC default)	Unlimited
Elevated	5	15s	1
Critical	1	10s	1

Priority tier	Elevated delay	Critical delay	Example hooks
Critical	0 (run immediately)	0 (run immediately)	`nofraud_`, `woocommerce_payment_`
High	0 (run immediately)	300s (5 min)	`woocommerce_scheduled_subscription_*`, webhooks
Normal	300s (5 min)	900s (15 min)	`woocommerce_run_`, `action_scheduler_`
Deferrable	900s (15 min)	3600s (1 hr)	`facebook_for_woocommerce_`, `shipstation_`, `klaviyo_*`

Day	Max queue depth	Max threads_running	% of AS runs at critical
Jun 1 (baseline)	1,129	25	1.6%
Jun 7 (healthy)	47	10	0%
Jun 8 (crisis)	6,524	37	54%
Jun 9	3,277	12	27%
Jun 13	3,932	100	34%
Jun 14 (recovered)	64	8	0%
Jun 15	1,436	17	5%

Uh oh!

Enforce mode analysis: current limits too generous, tuning recommendations #42

Description

Summary

What Enforce Mode Does

Why Current Limits Don't Help

Jun 8 (Peak Crisis — 3,499 Slow Queries, 32,218s Total DB Time)

Jun 14 (Post-Crisis Baseline — 367 Slow Queries)

All 18 Days (May 30 – Jun 16)

Why the Limits Miss

Risk Assessment

Root Cause Comparison

Tuning Recommendations

1. Lower wp_admin from 45s to 20s

2. Lower rest_api from 30s to 15s

3. Cap action_scheduler at 30s instead of unlimited

4. Consider a dedicated admin_search context

Summary of Proposed Limits

Conclusion

AS Throttle Mode: Detailed Analysis

What AS Throttle Enforce Does

The Queue Pileup Problem

Why Threads vs. Queue Matters

Recommendation: Do Not Enable AS Throttle Enforce Yet

Priority: Fix Root Causes First

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Lower `wp_admin` from 45s to 20s

2. Lower `rest_api` from 30s to 15s

3. Cap `action_scheduler` at 30s instead of unlimited

4. Consider a dedicated `admin_search` context