Skip to content

Evaluate timeout records#135

Open
katstankiewicz wants to merge 4 commits into
mainfrom
evaluate-timeout-records
Open

Evaluate timeout records#135
katstankiewicz wants to merge 4 commits into
mainfrom
evaluate-timeout-records

Conversation

@katstankiewicz
Copy link
Copy Markdown
Collaborator

@katstankiewicz katstankiewicz commented Jun 4, 2026

  • Set a max number of attempts where a conversation reaches conversation timeout
  • Set a default of 1 because this is a model error and we want model stats based on this run, ie task_completion, conciseness etc.
  • Sets a hard cap for conversation length for models to adhere to
  • User behaviour and speech fidelity must still be 1.0 to get evaluated
    • If max_timeout_attempts is > 1, keep a cache of any previous runs that had timeout reached and user_behaviour == 1.0 so we can evaluate that run instead of trying again after meeting max_timeout_attempts
      • eg. max_timeout_attempts == 2, first run timesout and User behaviour == 1.0 but second run also timesout and user behaviour is 0. In this case don't run again, take the first run
  • Add conversation_finished_on_time as a diagnostic metric to see how many finished on time
  • Save timeout_accepted in result.json to show when a conversation timedout but we evaluate it
  • Update user behaviour fidelity metric to handle four conversation end modes: error, timeout, inactivity timeout and user ended call
  • Update default conversation timeout to 10 minutes (600s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant