This project creates a comprehensive, searchable website documenting student experiences using Large Language Models (LLMs) to solve CS182 homework problems. The website analyzes Special Participation A (non-coding problems) and Special Participation B (coding problems) submissions from Ed Discussion.
✅ 200+ Student Submissions - Complete documentation of LLM interactions
✅ 13+ LLMs Analyzed - DeepSeek, Gemini, Grok, Mistral, ChatGPT, Claude, and more
✅ Advanced Search & Filtering - By student, LLM, homework, keywords
✅ Insights Dashboard - Strengths, weaknesses, and behavior patterns for each LLM
✅ Student Attribution - Full credit with links to chat logs, docs, and repos
✅ Modern UI - Responsive design with UC Berkeley branding
This project addresses the extra credit requirements for Special Participation A & B:
For each of these two cases, we would like to have a summary of insights on how each of the different LLMs behave and common issues. What insights were gained from...
- ✅ Comprehensive Website - Static site ready to drop into eecs182.org
- ✅ LLM Behavior Analysis - Automated extraction of strengths/weaknesses/patterns
- ✅ Searchable Database - Filter by keyword, student name, LLM model
- ✅ Student Credits - All submissions attributed with external links
- ✅ Insights Summary - Common themes and comparative analysis
- 110 submissions from unique students
- Top LLMs: DeepSeek (16 posts), Mistral (10), Gemini (9), Grok (8)
- Most Tested: HW3, HW4, HW2, HW0
- Key Insights:
- Most LLMs can "one-shot" straightforward problems
- Explanations vary significantly in quality
- Hallucinations observed across multiple models
- Prompt engineering significantly impacts results
- 90 submissions from unique students
- Top LLMs: Gemini (14 posts), DeepSeek (8), Grok (8), Mistral (7)
- Most Tested: HW4, HW3, HW2
- Key Insights:
- Better at providing step-by-step coding solutions
- Often questions its own solutions (especially Claude)
- Struggles with complex multi-part problems
- Iteration improves accuracy
.
├── parse_participation_posts.py # Parse Ed JSON files
├── analyze_insights.py # Extract LLM behavior patterns
├── website_data/ # Parsed JSON data
│ ├── participation_a.json
│ ├── participation_b.json
│ ├── insights_a.json
│ ├── insights_b.json
│ └── statistics.json
├── website/ # Complete website (READY TO DEPLOY)
│ ├── index.html # Main page
│ ├── styles.css # UC Berkeley themed styles
│ ├── app.js # Interactive JavaScript
│ ├── data/ # Data files (copied from website_data)
│ └── README.md # Deployment instructions
├── ed_posts/ # Raw Ed Discussion data
│ └── detailed_posts/ # 558 JSON files
└── PROJECT_README.md # This file
cd website
python3 -m http.server 8000Open http://localhost:8000 in your browser.
If new Ed posts are added:
# 1. Download new posts (if needed)
python download_ed_final.py
# 2. Re-parse and analyze
python parse_participation_posts.py
python analyze_insights.py
# 3. Copy updated data
cp website_data/*.json website/data/
# 4. Website automatically reflects new data- ✅ Correct solutions for straightforward problems
- ✅ Helpful for understanding concepts
- ✅ Detailed step-by-step explanations
- ✅ Fast iteration and refinement
- ✅ Can handle multiple problem types
⚠️ Hallucinations (making up facts/formulas)⚠️ Errors on complex, multi-part problems⚠️ Overly verbose responses⚠️ Confusion on ambiguous questions⚠️ Requires careful prompt engineering⚠️ May skip crucial reasoning steps
- Strengths: Internal reasoning, correct answers, handles math well
- Weaknesses: Doesn't explain reasoning well, one-shot approach, skips steps
- Pattern: Captures details in internal reasoning but fails to communicate them
- Strengths: Good explanations, helpful, accurate
- Weaknesses: Hallucinations, errors on complex problems
- Pattern: Strong general-purpose assistant, struggles with edge cases
- Strengths: Correct answers (70-80%), learns from feedback
- Weaknesses: Very verbose, loses focus, acts preemptively
- Pattern: Eager to help but sometimes overextends
- Strengths: Questions itself, detailed explanations, helpful
- Weaknesses: Can be verbose, occasional confusion
- Pattern: Most self-reflective model, helps users catch mistakes
- Strengths: Provides explanations, correct solutions
- Weaknesses: Hallucinations, errors on complex problems
- Pattern: Solid for standard problems, less reliable on edge cases
- Total submissions for A and B
- Number of contributing students
- Number of LLMs analyzed
- Quick navigation to all sections
- Common themes across all submissions
- LLM behavior analysis
- Comparative strengths/weaknesses
- Statistical overview
- Side-by-side comparison of different models
- Tabbed interface for A vs B
- Post count and average views
- Strengths, weaknesses, and patterns listed
- Search: Full-text search across all content
- Filter by Type: Participation A or B
- Filter by LLM: Any model tested
- Filter by Homework: Specific assignments
- Student Names: Find specific student work
- Keywords: Find specific topics/issues
- Complete post content (expandable)
- Student attribution with links
- LLM used and homework tested
- Links to chat transcripts, Google Docs, GitHub
- View counts and engagement
- Staff comments and endorsements
- Categorized by insight type
All 200 submissions are fully attributed to students with:
- Student names prominently displayed
- Links to their external resources:
- Chat transcripts (ChatGPT, Claude, DeepSeek links)
- Google Drive annotated documents
- GitHub repositories
- Personal websites (if provided)
- View counts showing popularity
- Staff endorsements highlighted
Students can edit their original Ed posts to add personal links, and the website can be regenerated to include them.
-
Ed Discussion Download (
download_ed_final.py)- Downloads all 558 posts from Ed Discussion
- Stores as individual JSON files with complete metadata
-
Post Parsing (
parse_participation_posts.py)- Extracts Special Participation A and B posts (200 total)
- Detects LLM models from content
- Identifies homework assignments
- Extracts all external links
- Categorizes insights (hallucinations, errors, explanations, etc.)
-
Insight Analysis (
analyze_insights.py)- Analyzes each LLM's behavior across all posts
- Extracts common strengths and weaknesses
- Identifies patterns in problem-solving approaches
- Generates statistical summaries
- Creates comparative analysis
-
Website Generation
- Static HTML/CSS/JavaScript
- No build process required
- Pure client-side rendering
- Fast, searchable, responsive
- ✅ Automated: LLM detection and insight extraction via NLP patterns
- ✅ Scalable: Can handle thousands of posts
- ✅ Maintainable: Simple update process for new posts
- ✅ Fast: Static site loads instantly
- ✅ Searchable: Client-side search is extremely fast
- ✅ Deployable: Works anywhere (no server needed)
# Copy the website folder to the server
scp -r website/ user@eecs182.org:/var/www/html/llm-participation/
# Or via git
git add website/
git commit -m "Add LLM participation website"
git pushAccess at: https://eecs182.org/llm-participation/
- GitHub Pages: Free, automatic deployment
- Netlify: Drag-and-drop deployment
- Vercel: One-command deployment
- Any static host: Works everywhere
See website/README.md for detailed deployment instructions.
Uses regex patterns to identify LLM models from titles and content:
- ChatGPT (4, 4o, 5, 5.1, o1, o3)
- Claude (3, 3.5, Sonnet, Opus)
- Gemini (1.5, 2, 2.5, 3)
- DeepSeek
- Grok
- Mistral
- Llama variants
- And more...
Automatically categorizes based on keyword analysis:
- Hallucinations: "hallucination", "made up", "fabricated"
- Errors: "error", "mistake", "wrong", "incorrect"
- Explanations: "explained", "reasoning", "understand"
- Problem-solving: "one-shot", "iterative", "step-by-step"
- Prompt Engineering: "prompt", "rephrased", "engineered"
- And more...
Analyzes aggregate content across all posts for each LLM:
- Positive patterns → Strengths
- Negative patterns → Weaknesses
- Behavioral patterns → Common behaviors
- Learn from peers' experiences with different LLMs
- Understand which models work best for different problem types
- Discover effective prompting strategies
- Find detailed examples of LLM interactions
- Track common issues across different LLMs
- Identify which models are most/least effective
- Understand student learning patterns
- Build better AI-assisted learning tools
- Dataset of 200 student-LLM interactions
- Comparative analysis across multiple models
- Real-world homework problem-solving scenarios
- Longitudinal data across semester
Possible additions (not implemented):
- Sentiment analysis of student experiences
- Time-series analysis (model improvements over semester)
- Topic modeling for common themes
- Citation graph (which posts reference others)
- Export to CSV/PDF for offline analysis
- Integration with course gradebook
- Anonymous submission viewing option
- Course: CS182: Deep Learning
- Website: https://eecs182.org
- Ed Discussion: https://edstem.org/us/courses/84647
- Data Source: Ed Discussion course 84647
- Student Work: Remains property of respective authors
- Website Code: Available for educational use
- Purpose: CS182 extra credit documentation
This project provides a complete, production-ready website for documenting and analyzing student experiences with LLMs in CS182. It includes:
✅ 200 student submissions fully documented
✅ Automated LLM behavior analysis with insights
✅ Searchable, filterable interface for easy exploration
✅ Complete student attribution with external links
✅ Ready for deployment to eecs182.org
✅ Maintainable and updateable for future semesters
The website can be dropped directly into eecs182.org and will provide lasting value for students, instructors, and researchers interested in understanding how different LLMs perform on real educational tasks.
Built with deep learning techniques and modern web development practices.