This is the single-file study version. Every question is followed immediately by its answer.
Use this rule in every answer:
- Start with what is implemented in this repo today.
- Explain why you designed it that way.
- Then explain what you would harden in production.
Study note:
- Every answer now includes its own
**Code reference:**line directly below it. - I only added code references where there is a real implementation to inspect.
- Some answers are intentionally conceptual or future-state, so their code reference will say that there is no direct single file.
- Red-Flag Phrases To Avoid Saying
- 1. Highest-Probability Questions
- 2. Project Overview and Product Framing
- 3. System Design and Architecture
- 4. Multi-Agent Design and CrewAI Collaboration
- 5. Prompt and Task Design
- 6. Tooling and Data Acquisition
- 7. Missing Data, Tool Errors, and Fallback Behavior
- 8. FastAPI, Async Execution, and API Responsiveness
- 9. Durable Jobs, Worker Coordination, and Recovery
- 10. Database Design and Persistence
- 11. Azure Blob Storage Integration
- 12. Azure Blob Security Questions
- 13. Secrets, Security, and Compliance
- 14. Model Choice, LLM Behavior, and Cost Control
- 15. Deployment Tradeoffs: Azure Functions vs VM vs Containers
- 16. Reliability, Observability, and Operations
- 17. Performance and Scalability
- 18. Testing Strategy
- 19. Tradeoffs, Limitations, and Honest Self-Critique
- 20. Behavioral and Ownership Questions
- 21. Good Follow-Up Questions You Should Expect
- 22. Questions You Should Be Ready to Ask the Interviewer
- Final Preparation Rule
Avoid saying these in interviews because they make you sound vague, junior, careless, or like you are overclaiming:
- "I just used AI to build most of it."
- "I mostly copied the architecture from a tutorial."
- "I let the model figure it out."
- "It should be fine."
- "I did not really think about failure cases."
- "Security was not important because it is only a demo."
- "I hardcoded it for now and left it."
- "Postgres is basically the same as a queue."
- "I do not know why I picked that design."
- "I did not test it, but it worked on my machine."
- "It is fully production-ready."
- "The LLM handled all of that."
- "I just used CrewAI because it is popular."
- "I was not thinking about scalability yet."
- "I did not really need architecture for this."
Use these framing patterns instead:
- "In the current repo, I implemented..."
- "I chose that tradeoff because..."
- "The failure mode I was trying to handle was..."
- "For production, I would harden it by..."
Answer: After the worker finishes generating the Markdown report, it uploads the file to Azure Blob Storage using BlobServiceClient. The resulting blob URL is stored in PostgreSQL along with the job result. I chose Blob Storage because it is a better fit for durable report artifacts than storing everything only in the database.
Code reference: src/shared/storage.py, src/workers/analysis_worker.py
Answer: In the code today, the security posture is basic but intentional: secrets are externalized into configuration rather than hardcoded, uploads go to a dedicated container, and failures are surfaced explicitly rather than hidden. In a production deployment, I would keep the container private, avoid direct anonymous access, and move from raw connection strings toward managed identity and Key Vault.
Code reference: src/shared/storage.py, src/shared/config.py
Q3. If those reports contain sensitive financial analysis, how would you protect them in production?
Answer: I would treat Blob as a private artifact store rather than a public file host. That means private containers, authenticated retrieval, short-lived signed access only when needed, audit logs, and preferably serving downloads through an authenticated API instead of exposing raw blob URLs directly.
Code reference: src/shared/storage.py, src/shared/config.py
Q4. How did you define the responsibilities of the Quantitative Analyst agent versus the Investment Strategist agent?
Answer: I split them by evidence type. The Quantitative Analyst focuses on structured financial metrics and market-relative performance, while the Investment Strategist focuses on recent news, sentiment, and final synthesis. That separation makes the workflow easier to control and easier to explain.
Code reference: src/agents/agents.py, src/agents/tasks.py
Answer: I wanted separation of concerns. The quant agent is grounded in hard numerical evidence, while the strategist consumes that output and adds qualitative narrative before issuing a final recommendation. A single agent could do both, but the reasoning path is usually less disciplined and harder to debug.
Code reference: src/agents/agents.py, src/agents/tasks.py
Q6. How does CrewAI ensure the strategist receives the quantitative output before making a recommendation?
Answer: I used Process.sequential and explicit task context in CrewAI. The strategist task only runs after the quant task completes, and it receives the quant output through the task context dependency.
Code reference: src/agents/tasks.py, src/agents/crew.py
Answer: The current design degrades rather than fails immediately. Missing values come through as N/A, and true tool failures return explicit error text, so the workflow can still continue with partial context. In production, I would make this more structured with completeness flags and confidence scoring.
Code reference: src/agents/tools/financial.py, src/agents/tasks.py
Q8. How do you make sure the strategist still gets usable context when quantitative data is incomplete?
Answer: I made the quant output a required dependency of the strategist task. Even when the data is incomplete, the strategist still receives a structured summary instead of starting from ungrounded narrative alone.
Code reference: src/agents/tasks.py, src/agents/crew.py
Q9. What are the tradeoffs between Azure Functions and a dedicated VM or worker process for running long-lived agents?
Answer: Azure Functions is attractive for bursty short-lived workloads because it reduces idle cost and infrastructure overhead. The downside is that long-running agent workflows are a poor fit because of cold starts, timeout concerns, and less operational control. A dedicated worker costs more to keep warm, but it is the more honest fit for multi-minute AI jobs.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Q10. How did you handle long-running analysis tasks in the FastAPI backend without blocking the API?
Answer: I moved the actual analysis out of the request lifecycle. The FastAPI service only creates and serves durable job state, while a separate worker process runs the long-running CrewAI workflow in the background.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Q11. What strategies did you use to keep the API responsive while the analysis runs in the background?
Answer: The main strategy is that POST /analyze returns 202 Accepted with a durable job_id instead of waiting for the final result. The frontend then polls the job status endpoint while the worker processes the analysis asynchronously.
Code reference: src/api/routes.py, src/frontend/app.py
Q12. How do the agents collaborate end to end, and what role does Azure Blob Storage play in the final workflow?
Answer: The quant agent runs first and produces the hard-number summary. The strategist consumes that output, adds recent narrative context, and writes the final investment memo. Azure Blob Storage then stores the memo as the durable artifact, while PostgreSQL stores the job state and metadata.
Code reference: src/agents/crew.py, src/workers/analysis_worker.py
Answer: This project is a production-style multi-agent stock analysis platform. A user submits a ticker, the API creates a durable job in PostgreSQL, a background worker executes a two-agent CrewAI workflow, and the final investment memo is stored in Azure Blob Storage while job state and report metadata are tracked in PostgreSQL.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: It turns a fragmented manual research task into a structured workflow. Instead of manually gathering metrics, checking recent news, and writing a memo, the system coordinates those steps into one repeatable process and produces a consistent analysis report.
Code reference: src/agents/tasks.py, src/frontend/app.py
Answer: Stock analysis naturally combines structured quantitative evidence and unstructured narrative evidence. That made it a strong use case for showing why specialized agents can be helpful instead of pushing everything through one generic prompt.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: The operational layer. It has durable jobs, asynchronous API behavior, a background worker, worker heartbeats, stale-job recovery, cloud artifact storage, and explicit failure handling. Those are the concerns that make AI systems behave like real services instead of demos.
Code reference: src/api/routes.py, src/shared/database.py
Answer: Streamlit is the frontend, FastAPI is the API layer, the worker process owns long-running execution, CrewAI handles orchestration, yFinance and Firecrawl are the tool layer, PostgreSQL stores job state and report history, and Azure Blob Storage stores report artifacts. Each boundary exists because those parts have different lifecycles and responsibilities.
Code reference: src/frontend/app.py, src/workers/analysis_worker.py
Q6. What happens from the moment a user enters a ticker to the moment the final report is available?
Answer: The user submits a ticker through Streamlit or the API, the API writes a queued job to PostgreSQL, the worker claims it, runs the quant task and strategist task, uploads the report to Blob Storage, finalizes job state in PostgreSQL, and the frontend polls until the job becomes completed or failed.
Code reference: src/frontend/app.py, src/workers/analysis_worker.py
Answer: The intended user is an analyst or internal user who wants a first-pass research memo rather than raw financial data only. From a portfolio standpoint, it is also aimed at architects and AI engineers evaluating system design maturity.
Code reference: src/frontend/app.py
Answer: It demonstrates service boundaries, asynchronous workflows, persistence design, worker coordination, cloud storage integration, configuration management, error handling, and production-oriented tradeoffs. The point is not only that I used an LLM, but that I shaped the system around the operational realities of long-running AI work.
Code reference: src/api/routes.py, src/shared/database.py
Answer: I would emphasize the boundary between short-lived API requests and long-running worker execution, the explicit job lifecycle in PostgreSQL, and the separation between artifact storage and operational metadata. Those are the most architecturally meaningful decisions in the repo.
Code reference: src/api/routes.py, src/shared/database.py
Answer: I would say it proves I can take an AI feature beyond a notebook and shape it into an end-to-end service. It shows applied AI engineering, backend architecture, cloud integration, and system-hardening instincts rather than just prompt usage.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: Each part has a different lifecycle and responsibility. That separation makes the system easier to reason about, easier to scale, and easier to evolve without rewriting everything at once.
Code reference: src/frontend/app.py, src/api/routes.py
Answer: The analysis is too slow and dependency-heavy for a single blocking request to be a good contract. Returning a durable job ID keeps the API responsive and gives the client a clean way to track progress.
Code reference: src/api/routes.py, src/api/models.py
Answer: It gives durable storage, transactional updates, and easy inspection of queued, running, completed, or failed work. For this scale of project, it was the simplest correct queueing backbone.
Code reference: src/shared/database.py
Q4. Why is Azure Blob Storage used for report artifacts instead of storing everything in the database?
Answer: Report files are better suited to object storage than relational storage. PostgreSQL is better as the source of truth for state and metadata, while Blob Storage is better for durable generated artifacts.
Code reference: src/shared/storage.py, src/shared/database.py
Answer: It keeps the database focused on queryable operational data and lets object storage handle file-like outputs more naturally. That separation also mirrors real platform patterns for generated artifacts.
Code reference: src/shared/storage.py, src/shared/database.py
Answer: The frontend submits a job, gets back a job_id, then polls the job status endpoint until the job transitions to completed or failed. That is a simple and reliable contract for long-running background work.
Code reference: src/frontend/app.py, src/api/routes.py
Q7. Why did you choose a worker process instead of running CrewAI directly inside the FastAPI request handler?
Answer: CrewAI plus external API calls is long-running blocking work. That belongs in a background execution model, not inside the web request lifecycle.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: The main failure boundaries are external APIs, worker crashes, database writes, blob uploads, and stale job leases. I modeled those explicitly because they affect correctness and operability.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: It assumes at least one worker is active and heartbeating. I added a guard so the API rejects new jobs when no active worker exists instead of pretending the system can process them.
Code reference: src/api/routes.py, src/shared/database.py
Answer: I would evaluate the queue backbone first. PostgreSQL is fine at this scale, but at higher concurrency I would likely move toward Azure Service Bus or another broker with stronger retry and throughput semantics.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Q11. What part of the current architecture is production-oriented, and what part is still simplified?
Answer: The production-oriented parts are the durable job model, cloud persistence split, worker recovery, and explicit failure handling. The simplified parts are authentication, enterprise security, observability depth, and formal evaluation.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I would add auth, managed identity, Key Vault, stronger observability, a broker-backed queue, and more explicit evaluation and confidence controls around the recommendation layer.
Code reference: src/shared/config.py, src/api/routes.py
Answer: The work naturally splits between quantitative analysis and qualitative synthesis. Two agents let me keep prompts, tools, and responsibilities cleaner.
Code reference: src/agents/agents.py
Answer: I wanted the workflow to start from hard evidence. Narrative should inform the recommendation only after the numeric baseline is established.
Code reference: src/agents/agents.py, src/agents/tasks.py
Answer: I wrote them to reinforce specialization. The quant agent is skeptical, metric-driven, and focused on hard numbers, while the strategist is responsible for interpreting narrative and producing the final recommendation.
Code reference: src/agents/agents.py
Answer: The quant task is limited to metrics, benchmark comparison, and red flags. The strategist task is limited to recent narrative research and final synthesis. The prompts and tool assignments both enforce that boundary.
Code reference: src/agents/tasks.py
Answer: The strategist task receives the quant task output as explicit context. That means it starts from the quant summary rather than a blank prompt.
Code reference: src/agents/tasks.py
Answer: Because the strategist depends on the quant result. Sequential execution was the simplest correct model for a two-stage dependent workflow.
Code reference: src/agents/crew.py, src/agents/tasks.py
Answer: The benefit is deterministic dependency flow and easier reasoning about handoffs. The downside is higher end-to-end latency because the stages cannot overlap.
Code reference: src/agents/crew.py
Answer: I force the strategist to receive the quant output as context and I instruct the strategist prompt to synthesize numbers with narrative. That makes it much harder for the final answer to drift away from the financial evidence.
Code reference: src/agents/tasks.py
Answer: The main risks are duplicated reasoning, context drift, and overconfidence when weak information is passed downstream. That is why I kept the workflow simple and the handoff explicit.
Code reference: src/agents/tasks.py, src/agents/crew.py
Answer: The strategist still proceeds, but its grounding is weaker. In production, I would add structured data quality indicators and confidence-aware recommendation behavior.
Code reference: src/agents/tasks.py, src/agents/tools/financial.py
Q11. Why did you give the quant agent yFinance tools and the strategist Firecrawl instead of letting both use everything?
Answer: I wanted tool usage to align with responsibility. Letting both agents use all tools would reduce specialization and make debugging much harder.
Code reference: src/agents/agents.py, src/agents/tools/scraper.py
Answer: I select only a compact set of metrics from yFinance and limit Firecrawl results to a few items. That keeps the evidence useful without flooding the context window with noise.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: They operate on different evidence types and different task prompts. The quant agent does the numeric screening, and the strategist adds narrative synthesis rather than redoing the same financial pass.
Code reference: src/agents/tasks.py, src/agents/agents.py
Answer: A distinct responsibility such as risk review, compliance review, source verification, or confidence scoring would justify a third agent.
Code reference: src/agents/agents.py, src/agents/tasks.py
Answer: It would challenge the recommendation, evaluate whether the evidence is strong enough, and enforce rules like uncertainty disclosure or missing-data warnings before a report is finalized.
Code reference: src/agents/agents.py, src/agents/tasks.py
Answer: I designed the quant prompt to request specific metrics, a benchmark comparison against SPY, and explicit red-flag detection. That keeps the first stage structured and grounded.
Code reference: src/agents/tasks.py
Answer: I designed it to consume the quant output, gather a small number of recent news items, and synthesize those signals into a final buy, sell, or hold recommendation with reasoning.
Code reference: src/agents/tasks.py
Answer: Dumping raw payloads creates context noise and makes the next stage harder. I wanted distilled evidence rather than a huge uncurated yFinance object.
Code reference: src/agents/tasks.py
Answer: That is the core purpose of the system. Financial analysis should combine quantitative and qualitative evidence, but only in a structured way.
Code reference: src/agents/tasks.py
Answer: I make each agent responsible for a narrow kind of evidence, constrain the tools they can use, and explicitly ground the strategist in the quant output. Prompt scope is a big part of hallucination control.
Code reference: src/agents/tasks.py, src/agents/agents.py
Answer: I prescribe what evidence must be considered, but I still leave room for natural reasoning in the final narrative. Over-constraining everything can make the report brittle.
Code reference: src/agents/tasks.py
Answer: Markdown is readable in the UI, easy to store, easy to download, and works well as a durable report artifact for both humans and systems.
Code reference: src/agents/tasks.py
Answer: I would tighten expected-output instructions, add length limits, and possibly require more explicit section boundaries or bullet caps.
Code reference: src/agents/tasks.py
Answer: I would strengthen the prompt language about the quant summary being the primary grounding layer and require the recommendation to reference concrete metrics.
Code reference: src/agents/tasks.py
Q10. How do you think about prompt design differently when building a system rather than a one-off chat workflow?
Answer: In a system, prompt design is about controlling handoffs, tool usage, failure behavior, and downstream consistency, not just about getting one nice answer once.
Code reference: src/agents/tasks.py, src/agents/crew.py
Answer: It is fast to integrate, gives good public-market coverage, and is sufficient for a portfolio project that needs real market data without a paid enterprise feed.
Code reference: src/agents/tools/financial.py
Answer: It is not an institutional-grade data contract. The limitations are consistency, guarantees, and long-term reliability compared with stronger commercial providers.
Code reference: src/agents/tools/financial.py
Answer: I wanted the strategist to consume cleaned textual content rather than only snippet-level search results. Firecrawl is useful for getting readable evidence into the model.
Code reference: src/agents/tools/scraper.py
Answer: More articles often add more noise than value. Limiting the result set helps control latency and context size while still giving the strategist enough narrative evidence.
Code reference: src/agents/tools/scraper.py
Answer: JSON text is a stable structured format for the agent to consume. It is more reliable than implicitly returning an in-memory Python object and hoping the framework handles it cleanly.
Code reference: src/agents/tools/financial.py
Answer: I chose a compact set that gives a useful snapshot of valuation, size, profitability, volatility, and price context: price, market cap, P/E, EPS, beta, and 52-week range.
Code reference: src/agents/tools/financial.py
Answer: SPY gives a simple market-relative benchmark. It helps the quant agent answer whether the stock outperformed or underperformed the broader market over the same period.
Code reference: src/agents/tasks.py, src/agents/tools/financial.py
Answer: It provides directional context without requiring a full analytics engine. It is a lightweight but useful measure of relative performance.
Code reference: src/agents/tools/financial.py
Answer: They can be slow, incomplete, unavailable, or inconsistent. That is why error handling, retries, and fallback behavior matter so much in this kind of system.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: I would add retries, timeouts, structured error payloads, caching, provider abstraction, and stronger validation of returned data before it reaches the agents.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: The tool returns N/A for missing fields and still returns the rest of the payload. That allows the workflow to degrade gracefully instead of failing immediately.
Code reference: src/agents/tools/financial.py
Answer: The tool returns an explicit error string and the workflow continues with degraded context. That keeps the pipeline alive, although I would make this more structured in production.
Code reference: src/agents/tools/financial.py
Q3. Why did you choose to return tool-level error text rather than crash the entire crew immediately?
Answer: I wanted the downstream agent to know that the upstream tool had a problem instead of only failing silently. The tradeoff is that this is less structured than an explicit error schema.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: It still consumes whatever context the quant task produced. Today that behavior depends mostly on prompt quality; in production I would make confidence reduction more explicit.
Code reference: src/agents/tasks.py, src/agents/crew.py
Answer: I would return structured fields like available_metrics, missing_metrics, error_type, and confidence instead of only N/A values or natural-language error text.
Code reference: src/agents/tools/financial.py
Answer: Conceptually, partial data means the metric is unavailable, while a hard failure means the integration failed. I would make that distinction explicit in the tool schema if I were hardening the system.
Code reference: src/agents/tools/financial.py
Q7. How would you prevent the strategist from making an overconfident recommendation when data quality is weak?
Answer: I would require the final report to mention missing evidence and lower confidence when critical metrics are absent. Today that is mostly a prompt-level behavior rather than a hard rule.
Code reference: src/agents/tasks.py
Answer: I would base it on data completeness, number of successful tool calls, agreement between quant and narrative signals, and the availability of critical metrics.
Code reference: src/agents/tasks.py
Answer: I would add ticker format validation, expected-schema validation for tool responses, and guardrails around empty or malformed data before agents consume it.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: It depends on the service and stage. Market-data gaps may justify graceful degradation, transient infra failures should retry, and persistence failures should fail loudly because false success is worse than visible failure.
Code reference: src/workers/analysis_worker.py, src/shared/storage.py
Answer: It ties user experience directly to external latency and makes capacity management more fragile. It also creates timeout and concurrency problems for the API layer.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I made the API responsible only for job submission and status retrieval. The long-running analysis happens in the worker process.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: Because the request is accepted for processing but not yet complete. That is the correct contract for a durable long-running job.
Code reference: src/api/routes.py, src/api/models.py
Answer: It returns the job state, timestamps, worker ownership, final report content if complete, blob URL if available, and error information on failure. That gives the client a full picture of the lifecycle.
Code reference: src/api/routes.py, src/api/models.py
Answer: Polling is acceptable because the UI is simple and the frequency is low. If I needed richer real-time behavior or much larger scale, I would consider WebSockets, SSE, or event-driven notifications.
Code reference: src/frontend/app.py, src/api/routes.py
Answer: Because the workload is too long-running and operationally significant. I wanted a durable worker execution model rather than an in-process helper mechanism.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: Threadpool offloading can help with sync calls, but it is not durable job execution. A separate worker process adds more moving parts but is much cleaner for multi-minute AI workflows.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I keep the API layer lightweight and use threadpool wrappers for synchronous DB access. The more important design choice is that the heavy analysis work is not running inside the API process.
Code reference: src/api/routes.py
Answer: The job still exists because it is stored durably in PostgreSQL. The lifecycle no longer depends on the client connection.
Code reference: src/api/routes.py, src/shared/database.py
Answer: I would add stage-level statuses such as collecting_metrics, researching_news, writing_report, and persisting_artifacts.
Code reference: src/api/models.py, src/shared/database.py
Answer: I would add auth at the API boundary, per-user authorization on job access, request quotas, and rate limiting. I would be explicit in interviews that this repo does not implement those controls yet.
Code reference: src/api/routes.py
Answer: I would add tenant identity, access scoping on job reads, tenant-aware storage layout, quotas, and stricter isolation of metadata and artifacts.
Code reference: src/api/routes.py, src/shared/database.py
Answer: The system supports queued, running, completed, and failed. That is enough to make the lifecycle clear without overcomplicating the state machine.
Code reference: src/shared/database.py, src/api/models.py
Answer: It uses a locking query with skip_locked, selects the next queued job, marks it running, and attaches a worker_id. That prevents duplicate claims under normal operation.
Code reference: src/shared/database.py
Answer: For this scale, it was the simplest correct mechanism. It kept job state, history, and queue semantics in one place without adding another infrastructure dependency.
Code reference: src/shared/database.py
Answer: Worker heartbeats prove that at least one execution process is alive recently enough for the API to accept new work.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: Job heartbeats prove that a specific running job is still making progress. That supports lease-based recovery if the worker disappears.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: The system scans for running jobs whose heartbeat is too old and moves them back to queued. Another worker can then claim them later.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: It solves the orphaned-job problem. Without it, a worker crash could leave a job stuck in running forever.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: The job eventually becomes stale and is re-queued. That gives the system a recovery path instead of leaving the job permanently stuck.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: There is a risk of artifact duplication or partial completion. I reduced the database inconsistency risk by making the report-log write and job-completion update atomic, but Blob remains an external side effect, so stronger idempotency would still be needed for a hardened version.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: It reduces cases where the report log and job status disagree. The database should not say one thing while the report record says another.
Code reference: src/shared/database.py
Answer: I combined report-log persistence and job completion into one database transaction. That gives much stronger consistency on the database side.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: Blob uploads can still duplicate if a job is retried after partial success. That is where idempotent artifact naming or deduplication logic would help.
Code reference: src/workers/analysis_worker.py, src/shared/storage.py
Answer: I would use stable artifact identifiers per job, check whether output already exists, and record enough metadata to detect partial prior completion.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: The API returns 503 instead of accepting work it cannot process. That was an intentional correctness improvement.
Code reference: src/api/routes.py, src/shared/database.py
Answer: It distinguishes between full service availability, degraded operation caused by missing workers, and true unhealthy states like database access failure.
Code reference: src/api/main.py, src/shared/database.py
Answer: I would monitor worker heartbeat recency, queue depth, oldest queued job age, running job count, stale-job recovery count, completion rate, failure rate, and average job duration.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: The limits are polling overhead, contention at higher concurrency, and weaker native retry and dead-letter semantics than a real broker. It is fine here, but not the final answer at larger scale.
Code reference: src/shared/database.py
Answer: I would move when throughput, retry control, delayed retries, multi-service consumption, or operational scale justify a dedicated broker.
Code reference: src/shared/database.py
Answer: The main tables are analysis_jobs, reports_log, and worker_heartbeats. They store job lifecycle, report history, and worker liveness respectively.
Code reference: src/shared/database.py
Answer: The job row supports operational polling and status, while the report log acts as a historical record of completed outputs. They serve different concerns.
Code reference: src/shared/database.py
Answer: The important fields are id, ticker, status, worker_id, report_content, report_url, error_message, and the lifecycle timestamps. Those fields make each job inspectable end to end.
Code reference: src/shared/database.py
Answer: They let me reason about queue delay, processing time, and operational performance. They are basic observability data.
Code reference: src/shared/database.py
Answer: You can inspect where a job stalled, whether it was claimed, which worker took it, and whether it failed before or after artifact persistence.
Code reference: src/shared/database.py
Answer: It helps attribute running work to a specific worker and makes ownership and failure scenarios easier to diagnose.
Code reference: src/shared/database.py
Answer: I would prioritize indexes on status, created_at, possibly updated_at, and any fields used by queue scans or dashboards.
Code reference: src/shared/database.py
Answer: It gives transactional guarantees for multi-step database updates within a single session and commit. It does not make external side effects such as Blob uploads transactional.
Code reference: src/shared/database.py
Answer: I would likely store less duplicated content in the job table and rely more on Blob Storage plus metadata pointers to reduce database bloat.
Code reference: src/shared/database.py, src/shared/storage.py
Answer: Possibly, if report volume or size grew enough to make that inefficient. For this project it is still useful because the API can return the report directly without an extra fetch step.
Code reference: src/shared/database.py, src/shared/storage.py
Answer: It is a natural fit for durable report files and aligns well with the Azure-oriented architecture in the project.
Code reference: src/shared/storage.py
Answer: It uses the Azure Blob Python SDK with a connection string loaded from configuration. The storage service initializes the client and ensures the reports container exists.
Code reference: src/shared/storage.py, src/shared/config.py
Answer: The crew writes investment_report_<TICKER>.md locally, the worker uploads that file to Blob Storage, and then the blob URL is written back to the job result.
Code reference: src/workers/analysis_worker.py, src/shared/storage.py
Answer: The worker owns the long-running pipeline and knows whether report generation actually succeeded. The API should remain lightweight.
Code reference: src/workers/analysis_worker.py
Answer: It gives a clean logical home for generated report artifacts. It is a simple but useful storage boundary.
Code reference: src/shared/storage.py
Answer: Storing it as an artifact preserves it beyond process lifetime and allows later retrieval independent of the immediate API response.
Code reference: src/workers/analysis_worker.py, src/shared/storage.py
Answer: The storage layer raises a FileNotFoundError, which causes the job to fail rather than pretending the upload succeeded.
Code reference: src/shared/storage.py
Answer: Blob upload errors are raised explicitly as exceptions. That was a deliberate fix because silent or misleading behavior would make the system untrustworthy.
Code reference: src/shared/storage.py
Answer: I would add job IDs, timestamps, tenant or environment prefixes, and possibly version suffixes. That would improve uniqueness, organization, and idempotency.
Code reference: src/shared/storage.py
Answer: I would version by job ID or timestamp rather than overwriting by ticker name alone. That preserves history and reduces accidental collisions.
Code reference: src/shared/storage.py, src/shared/database.py
Answer: The implemented measures are basic: secrets are not hardcoded, the storage layer is isolated behind configuration and a service class, and upload failures are handled explicitly. The repo does not yet implement advanced Azure-native security controls by itself.
Code reference: src/shared/storage.py, src/shared/config.py
Answer: Connection strings are powerful shared secrets. If they leak through logs, CI output, or a compromised machine, an attacker may gain broad storage access.
Code reference: src/shared/config.py
Q3. If the reports contain sensitive financial analysis, what Azure controls would you add in production?
Answer: I would add private containers, managed identity, RBAC, Key Vault, private endpoints, audit logging, and controlled retrieval paths instead of direct public access.
Code reference: src/shared/storage.py, src/shared/config.py
Answer: Private. AI-generated reports are usually internal artifacts, not public web assets.
Code reference: src/shared/storage.py
Answer: I would use SAS only when a client truly needs short-lived direct object access. Otherwise I would prefer service-side retrieval through the API so access control stays centralized.
Code reference: src/shared/storage.py
Answer: When the service runs inside Azure and can authenticate without long-lived secrets. That is the cleaner production pattern.
Code reference: src/shared/config.py
Answer: I would authorize the caller in the API and either proxy the download or mint a short-lived signed URL only when necessary.
Code reference: src/shared/storage.py, src/api/routes.py
Answer: I would store secrets in Key Vault, minimize credential scope, and design the service to reload or redeploy safely when rotated.
Code reference: src/shared/config.py
Answer: Encryption at rest is the baseline. I would consider customer-managed keys if compliance or enterprise key-control requirements justified the added complexity.
Code reference: src/shared/storage.py
Answer: They remove public network exposure and keep traffic inside the virtual network boundary.
Code reference: src/shared/config.py
Answer: I would separate access by identity and role and possibly by storage path or account depending on sensitivity and tenancy needs.
Code reference: src/shared/storage.py
Answer: I would want logs showing who accessed which artifact, when, and from where. That is important for auditability and incident review.
Code reference: src/shared/storage.py
Answer: They help move old artifacts to cheaper tiers, enforce retention windows, and manage storage costs as report volume grows.
Code reference: src/shared/storage.py
Answer: I would tighten access paths, reduce direct blob exposure, add stronger encryption and key management controls, and formalize retention and audit processes.
Code reference: src/shared/storage.py, src/shared/config.py
Answer: Secrets are managed through environment variables and loaded via Pydantic settings. That is fine for local development but not the full production answer.
Code reference: src/shared/config.py
Answer: It is simple and developer-friendly for local work, but it depends on host-level secrecy and manual distribution. Enterprise production needs stronger secret storage and access control.
Code reference: src/shared/config.py
Answer: I would move OpenAI, Firecrawl, Blob Storage, and Postgres secrets there, or replace some of them with managed identity where possible.
Code reference: src/shared/config.py
Answer: I would use environment-specific secret stores, separate cloud resources, and strict non-reuse of credentials across environments.
Code reference: src/shared/config.py
Answer: I would focus on secret leakage, unauthorized access to generated reports, API abuse, data exfiltration from cloud storage, and misuse of the AI workflow itself.
Code reference: src/shared/config.py, src/api/routes.py
Answer: The stock ticker itself is not sensitive, but generated report content, cloud credentials, and operational metadata can be. If the reports are internal research, they should be treated as sensitive artifacts.
Code reference: src/shared/database.py, src/shared/storage.py
Answer: I would avoid logging raw secrets, minimize full report dumps, and keep error messages explicit enough for debugging without exposing sensitive values.
Code reference: src/workers/analysis_worker.py, src/api/routes.py
Answer: I would use managed identity where possible, controlled secret management, network restrictions, and tighter egress controls in the Azure deployment.
Code reference: src/shared/config.py, src/shared/storage.py
Answer: I would add private endpoints, VNet integration, NSGs, and routing of sensitive service traffic over private rather than public paths.
Code reference: src/shared/config.py
Answer: I would add stronger access controls, auditability, retention rules, model governance, and likely a human review step before any recommendation is treated as decision-support output.
Code reference: src/shared/config.py
Answer: Model selection affects cost, latency, and output quality, so it belongs in runtime configuration rather than being hardcoded inside the agent factory.
Code reference: src/shared/config.py, src/agents/agents.py
Answer: A stronger model usually improves synthesis quality but increases latency and cost. A lighter model lowers cost and latency but may weaken reasoning quality across mixed evidence.
Code reference: src/shared/config.py, src/agents/agents.py
Answer: Queue handling, persistence, heartbeats, and tool invocation plumbing are deterministic. Summarization, synthesis, and the final narrative recommendation are model-driven.
Code reference: src/agents/agents.py, src/agents/tools/financial.py
Answer: It is real because the final recommendation is still LLM-generated. I reduce it by grounding agents in tools and structured handoffs, but I would not claim the system eliminates hallucination.
Code reference: src/agents/tasks.py, src/agents/tools/financial.py
Answer: I would add source citations, structured evidence fields, confidence scoring, and possibly schema validation around the final recommendation.
Code reference: src/agents/tasks.py, src/agents/agents.py
Answer: Once downstream consumers needed reliable machine-readable results or stronger guardrails around recommendation format.
Code reference: src/agents/tasks.py
Answer: I would check whether the final report refers back to actual metrics and retrieved news rather than drifting into generic market language.
Code reference: src/agents/tasks.py, src/agents/tools/scraper.py
Answer: Today I use the same configured model for simplicity and consistent behavior. In production I might split models if the cost-quality tradeoff justified it.
Code reference: src/agents/agents.py, src/shared/config.py
Answer: I would consider caching stable market data lookups, repeated benchmark data like SPY within a short window, and some research results when freshness requirements allow it.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: I would compare latency, token cost, factual grounding, consistency of verdicts, and human review of report usefulness across a representative ticker set.
Code reference: src/shared/config.py, src/agents/agents.py
Answer: Because long-running AI workflows suffer from cold starts, timeout pressure, and less control over always-on execution. That makes Functions a weaker fit for multi-minute jobs.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: Functions reduces idle cost and infrastructure overhead, while a dedicated worker gives better control, easier debugging, and a cleaner fit for long-running execution.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: Functions is usually cheaper at low and bursty volume, while always-on compute costs more steadily but avoids paying with unpredictability and cold-start friction.
Code reference: src/workers/analysis_worker.py
Answer: They add startup delay on top of already variable external latency, which hurts the user experience and makes runtime less predictable.
Code reference: src/workers/analysis_worker.py
Answer: Heavily. If the platform cannot reliably hold the workload long enough, it is the wrong execution model for the job.
Code reference: src/workers/analysis_worker.py
Answer: A VM gives more control, but it also increases patching, monitoring, and infrastructure maintenance responsibility.
Code reference: src/workers/analysis_worker.py
Answer: When I want containerized API and worker services, autoscaling, and less raw infrastructure management than a VM while still supporting long-running workers better than Functions.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I would lean more toward autoscaled containers or broker-driven worker pools instead of a single always-on worker.
Code reference: src/workers/analysis_worker.py
Answer: I would keep warm workers and avoid platforms with cold-start behavior that adds variability.
Code reference: src/workers/analysis_worker.py
Answer: I would likely recommend containerized API and worker services plus a proper message broker. That balances control and operability better than either Functions-only or a manually managed VM.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: External API failure, worker crashes, missing workers, stale jobs, Blob upload failure, database write failure, and weak or incomplete upstream data.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: It tells me whether the API can talk to the database and whether at least one worker is heartbeating. It does not tell me anything about output quality or whether external dependencies are degraded.
Code reference: src/api/main.py
Answer: Queue depth, oldest queued job age, running job count, completion rate, failure rate, average duration, stale recovery count, and worker heartbeat recency.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: No active workers, high failure rate, oldest queued job age over threshold, repeated stale-job recoveries, and persistence failures.
Code reference: src/api/main.py, src/workers/analysis_worker.py
Answer: I would follow the job_id through API logs, worker logs, job table state, and Blob persistence outcomes. The durable job table is especially useful because it captures status and timestamps.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I did not want the repo coupled to that tracing path and wanted to simplify configuration. The current system relies on simpler logs and durable job state instead.
Code reference: src/agents/crew.py, src/shared/config.py
Answer: I would add structured logs, per-stage status updates, metrics, correlation IDs, and possibly OpenTelemetry later if I needed deeper distributed tracing.
Code reference: src/api/main.py, src/workers/analysis_worker.py
Answer: Job submission, job claim, heartbeat failures, tool failures, Blob upload failures, database finalization failures, and explicit completion or failure logs with job_id.
Code reference: src/workers/analysis_worker.py, src/api/routes.py
Answer: I would measure them from job timestamps in PostgreSQL plus worker-side metrics and logs.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: A worker can still heartbeat while repeatedly failing jobs or making no progress. That is why I would monitor throughput and failure patterns in addition to liveness.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: I would inspect stale-job timing, heartbeat recency, claim logs, and whether a job was re-queued while the original worker was slow instead of actually dead.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: Stronger event logs, user attribution, access logs for report retrieval, and more explicit stage transitions in the job lifecycle.
Code reference: src/shared/database.py, src/api/routes.py
Answer: OpenAI inference, Firecrawl retrieval, and market-data fetches. The queue layer is not usually the first latency source at this scale.
Code reference: src/agents/crew.py, src/agents/tools/scraper.py
Answer: Firecrawl and model inference are the most likely bottlenecks because both are external and can vary significantly in latency.
Code reference: src/agents/tools/scraper.py, src/agents/crew.py
Answer: The jobs queue in PostgreSQL and are processed by available workers. Throughput then depends on how many workers are running and how long each analysis takes.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: It would still work, but eventually it would show more polling overhead and contention than a dedicated broker-backed system.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: External API latency, sequential workflow time, and queue architecture. Those will hurt earlier than raw CPU in many cases.
Code reference: src/shared/database.py, src/agents/crew.py
Answer: I would run multiple worker instances that all claim from the same durable queue.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: I would keep the quant-to-strategist dependency sequential, but I might parallelize independent data acquisition steps inside the workflow if they did not compromise clarity.
Code reference: src/agents/crew.py, src/agents/tasks.py
Answer: I would optimize prompts, reduce unnecessary context, cache stable tool results, and evaluate whether both agents need the same model size.
Code reference: src/agents/tasks.py, src/agents/tools/scraper.py
Answer: Stable market data lookups, repeated SPY benchmark data within a short window, and some research results when freshness requirements allow it.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: I would move toward a broker-backed queue, containerized worker autoscaling, stronger caching, and more formal observability and rate control.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: I would mock yFinance responses and verify that the tool returns the expected metrics, handles missing fields correctly, and surfaces errors properly.
Code reference: src/agents/tools/financial.py
Answer: I would mock the crew execution, storage service, and database calls so I could validate state transitions without real network dependency.
Code reference: src/workers/analysis_worker.py
Answer: Job submission, worker claim behavior, successful completion, failure handling, and stale-job requeue behavior.
Code reference: src/api/routes.py, src/shared/database.py
Answer: I would create a running job with an old heartbeat timestamp, run the recovery logic, and assert that it moves back to queued.
Code reference: src/shared/database.py
Answer: I would create jobs in each state and assert that the status endpoint serializes the expected fields correctly for each one.
Code reference: src/api/routes.py, src/api/models.py
Answer: I would stub the storage client to raise an error and then verify that the worker marks the job as failed.
Code reference: src/shared/storage.py, src/workers/analysis_worker.py
Answer: I would force a failure during finalization and assert that the database rolls back partial state correctly.
Code reference: src/shared/database.py
Answer: Unit and most integration tests should mock external AI and data services. I would keep a smaller set of controlled end-to-end tests against real services in a non-production environment.
Code reference: src/agents/tools/financial.py, src/workers/analysis_worker.py
Answer: Submit a ticker, wait for the worker to process it, verify the final job state, and assert that both the report artifact and metadata exist.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: The main gap is that the repo does not yet include a real automated test suite. I would be honest about that and explain how I would add coverage.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: The weakest part is that it still relies on a Postgres-backed queue and relatively light observability. It is strong for a portfolio project, but not the final architecture at higher scale.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: Authentication, authorization, enterprise observability, formal evaluation, and some Azure security hardening are intentionally simplified.
Code reference: src/api/routes.py, src/shared/config.py
Answer: Auth, managed identity, Key Vault, private network paths, stronger access control on artifacts, and better audit logging.
Code reference: src/shared/config.py, src/shared/storage.py
Answer: The queue backbone. That is the main scaling pressure point.
Code reference: src/shared/database.py
Answer: I would improve citation behavior, confidence disclosure, and more explicit linkage between evidence and final verdict.
Code reference: src/agents/tasks.py
Answer: Stronger validation, retries, provider abstraction, and structured completeness and confidence indicators.
Code reference: src/agents/tools/financial.py, src/agents/tools/scraper.py
Answer: I would think harder about idempotent artifact handling and whether the job table should continue storing full report content long-term.
Code reference: src/shared/database.py, src/shared/storage.py
Answer: Richer progress stages, better failure messaging, and possibly side-by-side evidence views for metrics and retrieved news.
Code reference: src/frontend/app.py, src/api/routes.py
Answer: Mostly around evaluation, security hardening, observability, and queue maturity. The service boundaries themselves are in a good place.
Code reference: src/api/routes.py, src/shared/database.py
Answer: I would focus on formal evaluation, auth and cloud-secret hardening, and stronger queue semantics or idempotency.
Code reference: src/shared/config.py, src/shared/database.py
Answer: Moving from a blocking AI demo to a reliable background workflow. That required thinking about job boundaries, worker ownership, persistence, and failure handling together.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: The biggest one was the original blocking API design. Refactoring it into durable jobs plus a worker process made the architecture much more defensible.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Q3. Describe a time when you changed the architecture after realizing the first version was not good enough.
Answer: The first version ran the whole analysis inside the request lifecycle. Once it became clear that long-running AI work did not belong there, I changed it to a job-and-worker model with explicit state and recovery.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I hardened correctness and operability first: job execution, persistence truthfulness, worker recovery, and configuration behavior. I deprioritized improvements that did not change functional reliability.
Code reference: src/shared/database.py, src/shared/storage.py
Answer: I built the simplest working version first, then hardened the highest-risk failure modes in priority order. That let me move quickly without pretending the first version was production-ready.
Code reference: src/api/routes.py, src/shared/database.py
Answer: Treating AI work as distributed work with explicit state instead of as a long function call behind an endpoint. That is the biggest system-design signal in the project.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I would say I am most proud that the repo now tells a coherent systems story. It is not just "I used CrewAI"; it is "I designed how long-running AI work should behave in a service."
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: I would say it was a deliberate tradeoff for this scale: fewer dependencies, simpler state model, faster implementation, and good-enough semantics for the current stage.
Code reference: src/shared/database.py
Answer: Broker-backed queuing, stronger auth and cloud security, and more formal evaluation and confidence controls.
Code reference: src/shared/database.py, src/shared/config.py
Answer: It taught me that once you care about reliability, recovery, persistence, and operational correctness, agentic systems become real systems engineering work rather than prompt experiments.
Code reference: src/agents/crew.py, src/workers/analysis_worker.py
Answer: Yes. For example, if a worker dies mid-job, heartbeats stop, the job lease expires, and the system re-queues the work instead of leaving it stuck in running forever. That is the kind of concrete failure mode I was designing for.
Code reference: src/workers/analysis_worker.py, src/shared/database.py
Answer: I accepted more moving parts in exchange for better correctness, recoverability, and clearer service boundaries.
Code reference: src/api/routes.py, src/workers/analysis_worker.py
Answer: External API latency and the Postgres-backed queue would be the first stress points. Those would show pressure before many other parts of the system.
Code reference: src/shared/database.py, src/workers/analysis_worker.py
Answer: Queue depth, oldest queued job age, worker heartbeat recency, failure rate, p95 job duration, and external dependency errors.
Code reference: src/api/main.py, src/shared/database.py
Answer: Implemented now: durable jobs, worker heartbeats, stale-job recovery, cloud persistence separation, and multi-agent orchestration. Future-state hardening: auth, Key Vault, managed identity, broker-backed queue, stronger evaluation, and deeper observability.
Code reference: src/api/routes.py, src/shared/config.py
Answer: I would say it uses specialized AI roles to gather financial facts, read recent market context, and then produce a research memo automatically while storing the result safely for review.
Code reference: src/frontend/app.py, src/api/routes.py
Answer: It avoids blocking expensive web workers, stores artifacts in the right storage layer, and lets compute scale at the worker boundary instead of overloading the API layer. That is a better cost-performance shape than a naive synchronous design.
Code reference: src/api/routes.py, src/shared/storage.py
Answer: Add citations, confidence scoring, clearer evidence sections, and possibly a human review step for low-confidence outputs.
Code reference: src/agents/tasks.py, src/agents/tools/financial.py
Answer: I would add auth, rate limits, quotas, abuse monitoring, and input validation before exposing it broadly.
Code reference: src/api/routes.py, src/shared/config.py
Answer: Security and identity, observability and testing, and stronger queue or idempotency semantics.
Code reference: src/shared/config.py, src/shared/database.py
Answer: Ask this to understand whether they need someone who can prototype quickly or someone who can harden systems already in production. It tells you what maturity level they operate at. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this to learn whether they already use queues, workers, or orchestration frameworks for AI workloads. It reveals how mature their AI operations are. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this so you know where the team is hurting most. That helps you position your strengths against their real needs. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this because it shows you understand that AI workloads do not all belong in request-response flows. Their answer reveals architectural maturity. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this to understand whether they treat evaluation as a first-class system concern or still optimize prompts ad hoc. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this because generated artifacts are often overlooked security surfaces in AI systems. It signals that you think beyond the model call. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this to understand whether the role leans more research-heavy or systems-heavy. That helps you calibrate your examples. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
Answer: Ask this because it tells you what outcomes matter and lets you tailor your examples to their expectations. Code reference: No direct single code file. This item is mainly interview strategy rather than implemented repo logic.
For this project, your strongest answer shape is:
- State the problem.
- State the design choice.
- State the tradeoff.
- State the failure mode you handled.
- State the next production improvement.
Example: "I needed to keep the API responsive while the analysis could take minutes, so I moved execution into a worker and returned a durable job ID from the API. The tradeoff is more infrastructure complexity, but it gave me correctness, recoverability, and cleaner service boundaries. The next thing I would add in production is a broker-backed queue and stronger security controls."