Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,121 @@ Run simple Gemini connection test:
```bash
python tests/test_gemini_simple.py
```


---

## Study Generation Prompt System (v1)

The backend now uses a centralized prompt module for AI study generation.

Location:

```
backend/prompts/study_gen_v1.py
```

This module defines how Gemini transforms raw notes into:

- bullet-point summaries
- multiple-choice quiz questions


### How it connects to the API

The `/api/v1/generate` endpoint builds a prompt using:

```python
prompt = build_study_generation_prompt(request.text)
```

Then sends it to Gemini:

```python
response = await gemini_service.call_gemini(prompt)
```

The prompt module handles:

- system instructions
- output schema
- few-shot examples
- formatting rules
- quality validation


### Prompt builder functions

#### Full study pack

```python
build_study_generation_prompt(notes)
```

Returns summary + quiz.

#### Quiz only

```python
build_custom_quiz_prompt(notes, num_questions=3)
```

Generates quiz-only output.

#### Summary only

```python
build_summary_only_prompt(notes)
```

Generates summary-only output.


### Versioning

Current version:

```
study_gen_v1.py → VERSION 1.0.0
```

Future prompt improvements should create:

```
study_gen_v2.py
study_gen_v3.py
```

Never silently change v1 behavior — version prompts explicitly.


### Quality checks

After Gemini responds, quiz output is validated using:

```python
validate_quiz_quality()
```

This detects:

- duplicate options
- invalid answers
- weak question structure

Warnings are logged for debugging.


### Editing prompts safely

If you change prompt behavior:

1. Keep JSON schema identical
2. Do not change API response format
3. Test with messy notes input
4. Verify frontend still parses correctly

Breaking schema can end up breaking frontend.

---

59 changes: 39 additions & 20 deletions backend/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

from services import GeminiService

# Import the new prompt system
from backend.prompts.study_gen_v1 import (
build_study_generation_prompt,
validate_quiz_quality
)
Comment on lines +10 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken import when run in backend/

from backend.prompts.study_gen_v1 import ... will fail when starting the app from within backend/ (as documented via uvicorn main:app --reload in backend/README.md), because backend won’t be a top-level package in that execution context. This makes the server crash on startup in the common local/dev invocation; use an import that works from backend/ (e.g. from prompts.study_gen_v1 ...) or adjust the run command to uvicorn backend.main:app so the package import is valid.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/main.py
Line: 10:13

Comment:
**Broken import when run in backend/**

`from backend.prompts.study_gen_v1 import ...` will fail when starting the app from within `backend/` (as documented via `uvicorn main:app --reload` in `backend/README.md`), because `backend` won’t be a top-level package in that execution context. This makes the server crash on startup in the common local/dev invocation; use an import that works from `backend/` (e.g. `from prompts.study_gen_v1 ...`) or adjust the run command to `uvicorn backend.main:app` so the package import is valid.

How can I resolve this? If you propose a fix, please make it concise.



app = FastAPI(title="Socrato")

Expand Down Expand Up @@ -86,26 +92,32 @@ async def generate_study_materials(request: GenerateRequest):
- quiz (QuizQuestion[]): Array of quiz questions
"""
# Call Gemini to generate study materials
prompt = f"""You are a study assistant. Based on the following notes, generate:
1. A summary as a list of bullet points (3-5 key points)
2. A quiz with 3 multiple choice questions

Notes:
{request.text}

Respond in this exact JSON format:
{{
"summary": ["point 1", "point 2", "point 3"],
"quiz": [
{{
"question": "Question text?",
"options": ["A", "B", "C", "D"],
"answer": "A"
}}
]
}}

Return ONLY valid JSON, no markdown or extra text."""
# prompt = f"""You are a study assistant. Based on the following notes, generate:
# 1. A summary as a list of bullet points (3-5 key points)
# 2. A quiz with 3 multiple choice questions

# Notes:
# {request.text}

# Respond in this exact JSON format:
# {{
# "summary": ["point 1", "point 2", "point 3"],
# "quiz": [
# {{
# "question": "Question text?",
# "options": ["A", "B", "C", "D"],
# "answer": "A"
# }}
# ]
# }}

# Return ONLY valid JSON, no markdown or extra text."""

# Build prompt using the centralized prompt system
prompt = build_study_generation_prompt(
user_notes=request.text,
include_examples=True # Include few-shot examples for better quality
)

response = await gemini_service.call_gemini(prompt)

Expand Down Expand Up @@ -161,6 +173,13 @@ async def generate_study_materials(request: GenerateRequest):
answer=q["answer"]
))

# Optional: Run quality checks on the quiz
quality_warnings = validate_quiz_quality(data.get("quiz", []))
if quality_warnings:
print(f"[generate] Quality warnings: {quality_warnings}")
# Can log these or return them to the frontend in the future
Comment on lines +176 to +180
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sensitive data logged to stdout

On invalid/failed responses, this endpoint prints Raw response: {response} and also prints quality_warnings unconditionally. Gemini output can contain user-provided notes verbatim, so this will leak user content into server logs. Since this PR adds additional logging paths, it should be gated/removed or switched to structured logging with redaction (and avoid printing raw model output).

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/main.py
Line: 176:180

Comment:
**Sensitive data logged to stdout**

On invalid/failed responses, this endpoint prints `Raw response: {response}` and also prints `quality_warnings` unconditionally. Gemini output can contain user-provided notes verbatim, so this will leak user content into server logs. Since this PR adds additional logging paths, it should be gated/removed or switched to structured logging with redaction (and avoid printing raw model output).

How can I resolve this? If you propose a fix, please make it concise.



return GenerateResponse(
summary=data.get("summary", []),
quiz=quiz_questions
Expand Down
Loading