Skip to content

fix: NQL engine improvements — decimal formatting, schema context, and query accuracy #90

@William-Hill

Description

@William-Hill

Problems

1. Decimal formatting

Numeric results are returned with excessive decimal places when they should be rendered as percentages. For example 0.75 should display as 75%. This affects columns like pass rate, completion rate, retention probability, etc.

2. Missing / inadequate schema context

The LLM prompt-analyzer does not have reliable schema context, so it guesses column names and table structures. This causes it to query the wrong tables or use non-existent columns.

The schema context needs to include:

  • All available tables and their columns with types
  • Sample values / value ranges for categorical columns (e.g. grade IN ('A','B','C','D','F','W','I'))
  • Which columns are FERPA-sensitive and must never be returned (e.g. student_guid, Student_GUID, PII fields)
  • Notes on which table to use for which question type (e.g. course-level DFWI → course_enrollments, student predictions → student_level_with_predictions)

3. Nonsensical query results

For the prompt "What are the top 10 courses with the highest DFW rates?" the engine returned student-level program-of-study data instead of course-level DFWI rates:

Program of Study Year 1  avg completion rate  avg passing rate  total students
120401.00                0.75                 0.76              190
150000.00                0.75                 0.76              186
510000.00                0.75                 0.75              1042
...

Expected result: rows from course_enrollments grouped by course_prefix + course_number with a computed DFWI rate. The LLM queried course_predictions (student-level) and used CIP codes as the "course" identifier.

Root cause: the prompt-analyzer doesn't know course_enrollments exists or how to compute DFWI from the grade column.

Acceptance Criteria

  • Numeric results that represent rates/probabilities (values between 0–1) are auto-formatted as percentages in the results table
  • prompt-analyzer.ts includes a detailed schema block listing all tables, columns, FERPA-restricted fields, and query guidance
  • The prompt "What are the top 10 courses with the highest DFW rates?" returns rows from course_enrollments with correct DFWI rates
  • FERPA-sensitive columns (student_guid, name fields, DOB, SSN proxies) are excluded from the schema prompt and blocked from SELECT output

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions