Skip to content

Latest commit

 

History

History
45 lines (33 loc) · 1.64 KB

File metadata and controls

45 lines (33 loc) · 1.64 KB

Neural Chatbot - Current Status & Issues

What We've Discovered

The seq2seq model has a fundamental generation problem:

  • Training loss DOES decrease properly (0.7 → 0.0022)
  • Weights ARE updating correctly via backprop
  • Encoder IS working (learned representations)
  • BUT decoder generation gets stuck in loops repeating one word

Root Cause

During training: Decoder learns from full answer sequences (teacher forcing) During inference: Decoder must predict one token at a time without ground truth

This causes a exposure bias problem:

  • Model was trained on correct answers
  • During inference, it only sees its own predictions
  • If it predicts wrongly once, error compounds
  • Gets stuck in local minima (repeating "chatbot" forever)

Why Small Training Set Makes It Worse

With only 16 examples:

  • Model quickly memorizes training data
  • Decoder learns "safe" words that appear frequently
  • No diversity to learn proper generation
  • Overfits to repeating patterns

Solutions That Would Work

  1. Scheduled Sampling - Gradually expose model to its own errors during training
  2. Beam Search - Try multiple hypotheses at decode time
  3. Attention Mechanism - Decoder should reference encoder outputs
  4. More training data - 1000+ examples to learn diversity
  5. Use Pre-trained Models - GPT, BERT fine-tuned
  6. Retrieval + Ranking - Find similar Q&A, rank responses

Conclusion

This demonstrates a REAL neural chatbot implementation, but with realistic limitations of small datasets and inference bugs. The architecture is correct, but needs:

  • Better training strategy
  • Better decoding algorithm
  • More/better data