-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problems
1. Decimal formatting
Numeric results are returned with excessive decimal places when they should be rendered as percentages. For example 0.75 should display as 75%. This affects columns like pass rate, completion rate, retention probability, etc.
2. Missing / inadequate schema context
The LLM prompt-analyzer does not have reliable schema context, so it guesses column names and table structures. This causes it to query the wrong tables or use non-existent columns.
The schema context needs to include:
- All available tables and their columns with types
- Sample values / value ranges for categorical columns (e.g.
grade IN ('A','B','C','D','F','W','I')) - Which columns are FERPA-sensitive and must never be returned (e.g.
student_guid,Student_GUID, PII fields) - Notes on which table to use for which question type (e.g. course-level DFWI →
course_enrollments, student predictions →student_level_with_predictions)
3. Nonsensical query results
For the prompt "What are the top 10 courses with the highest DFW rates?" the engine returned student-level program-of-study data instead of course-level DFWI rates:
Program of Study Year 1 avg completion rate avg passing rate total students
120401.00 0.75 0.76 190
150000.00 0.75 0.76 186
510000.00 0.75 0.75 1042
...
Expected result: rows from course_enrollments grouped by course_prefix + course_number with a computed DFWI rate. The LLM queried course_predictions (student-level) and used CIP codes as the "course" identifier.
Root cause: the prompt-analyzer doesn't know course_enrollments exists or how to compute DFWI from the grade column.
Acceptance Criteria
- Numeric results that represent rates/probabilities (values between 0–1) are auto-formatted as percentages in the results table
-
prompt-analyzer.tsincludes a detailed schema block listing all tables, columns, FERPA-restricted fields, and query guidance - The prompt
"What are the top 10 courses with the highest DFW rates?"returns rows fromcourse_enrollmentswith correct DFWI rates - FERPA-sensitive columns (
student_guid, name fields, DOB, SSN proxies) are excluded from the schema prompt and blocked from SELECT output