Title
Background
Set of representative questions:
#345 (comment)
#411 (comment)
https://github.com/CodeForPhilly/balancer-main/tree/develop/evaluation
Current State
Acceptance Criteria
Approach
Start with error analysis, not infrastructure. Spend 30 minutes manually reviewing 20-50 LLM outputs whenever you make significant changes. Use one domain expert who understands your users as your quality decision maker (a “benevolent dictator”).
OpenAI API Dashboard has total duration and cost metrics
References
Risks and Rollback
Risks
| Risk |
Severity |
Mitigation |
Assistant endpoint behavior changes from the refactor (request path now flows through run_assistant). |
Medium |
The `(response_output_text, |
| final_response_id)` response contract is unchanged; service-level unit tests pass. Residual: OpenAI and the DB are mocked in tests, so the live retrieval |
|
|
| path is not exercised in CI — spot-check a real query after deploy. |
|
|
Missing input validation: an omitted/blank message is not rejected — it becomes the literal string "None" (str(None)) in the model input, producing |
|
|
| confusing output. |
Low |
Known and deferred; tracked by a TODO in views.py. No crash or data impact. Fix is to return 400 on omitted/blank message (+ |
| add 400 to the schema). |
|
|
Eval tooling (eval_assistant.py, review.ipynb) has first-run fragility — sys.path insert depth and the pandas dependency. |
Low |
Offline-only: not |
in the production request path, so zero runtime impact. The pandas import is deferred into main() so it cannot affect the app or test collection; the |
|
|
sys.path caveat is documented inline. |
|
|
| Reduced test count — low-signal glue/framework tests were removed. |
Low |
Real-logic coverage (tool loop, retrieval dispatch, orchestration decisions) is |
| retained. View HTTP-contract coverage is deferred to a future DB-backed integration test (TODO). |
|
|
Full-suite pytest surfaces pre-existing pgvector test-DB errors. |
Low |
Not introduced by this branch; the assistant tests are DB-free and pass when |
scoped (pytest api/views/assistant/). |
|
|
Rollback
- No database migrations are introduced in this branch — the changes are code, tests, and offline tooling only. Rollback is therefore code-only: no
schema or data reversal required.
- Revert the merge commit:
git revert -m 1 <merge_sha>.
- The eval script and notebook are independent of the endpoint and can be removed on their own without touching the request path.
- The pre-refactor monolithic view remains in git history if a targeted hotfix is preferred over a full revert.
Two things to confirm before you publish so the doc stays honest:
- "No migrations" — quick check: git diff develop --name-only | grep migrations should be empty. I'm confident it is, but verify since the claim anchors the
rollback story.
- The input-validation risk is the one a reviewer is most likely to push on. If you'd rather not ship a known-gap, that's the cue to implement the 400 now
instead of deferring — say the word and I'll apply the guard + schema update we discussed.
Screenshots / Recordings
Related PR
Title
Background
Set of representative questions:
#345 (comment)
#411 (comment)
https://github.com/CodeForPhilly/balancer-main/tree/develop/evaluation
Current State
Acceptance Criteria
Approach
Start with error analysis, not infrastructure. Spend 30 minutes manually reviewing 20-50 LLM outputs whenever you make significant changes. Use one domain expert who understands your users as your quality decision maker (a “benevolent dictator”).
OpenAI API Dashboard has total duration and cost metrics
References
Risks and Rollback
Risks
run_assistant).messageis not rejected — it becomes the literal string"None"(str(None)) in the model input, producingviews.py. No crash or data impact. Fix is to return 400 on omitted/blankmessage(+eval_assistant.py,review.ipynb) has first-run fragility —sys.pathinsert depth and thepandasdependency.pandasimport is deferred intomain()so it cannot affect the app or test collection; thesys.pathcaveat is documented inline.pytestsurfaces pre-existing pgvector test-DB errors.pytest api/views/assistant/).Rollback
schema or data reversal required.
git revert -m 1 <merge_sha>.Two things to confirm before you publish so the doc stays honest:
rollback story.
instead of deferring — say the word and I'll apply the guard + schema update we discussed.
Screenshots / Recordings
Related PR