Challenges and Solutions

The hardest problems in the build, what caused them, and what I took away. These are the ones that cost real time and changed how I work.

1. Making an AI tutor that won't cheat

Challenge: An LLM's default behavior is to be helpful, which means answering the question. For a tutor, that's exactly wrong: a tutor that gives answers is a cheating tool.

Solution: A Socratic system prompt with explicit hard rules (never give the final answer, redirect extraction attempts, end with a question), four distinct behavioral modes (Hint/Scaffold/Check/Explain), and a backstop detector for bypass phrases like "just tell me." The system prompt is composed per-session with the assignment context, rubric, and the student's draft so the redirection is specific, not generic.

What I learned: Prompt engineering is real engineering when the prompt is the product's core behavior. The difference between a cheating machine and a tutor is entirely in the system prompt and the modes around it. It took many iterations to get the tutor to hold the line under pressure ("I give up," "just this once") without becoming frustrating to a student who's genuinely stuck.

2. The KaTeX production-only bug

Challenge: Math rendered perfectly in local dev, showed raw LaTeX source on the VPS. Same code, same questions.

Solution: The react-katex InlineMath component threw on edge-case LaTeX and fell back to raw text, but only the production build's error handling let that fallback win; dev mode recovered silently. Replaced it with a direct katex.renderToString() call configured { throwOnError: false, strict: false }, which behaves identically in both builds.

What I learned: Dev and production builds handle errors differently, and a bug can hide entirely in that gap. New standing rule: when something works in local dev but breaks on the VPS, reproduce against a local production build (npm run build + preview) before touching the deployment. Full writeup in exam-engine.md.

3. Multi-tenant isolation without a tenancy bug

Challenge: One deployment, many academies. A scoping bug isn't a glitch. It's one academy seeing another's student data.

Solution: Two layers. Tenant scoping derives the tenant from the authenticated user (never a client-supplied parameter), and role filtering within the tenant ensures a student sees only their own work and a parent sees only their child's. The role gate is a FastAPI dependency that runs before any handler, so no endpoint can skip it.

What I learned: When isolation is enforced in application code rather than by a database boundary, the discipline has to be structural: derive tenant from identity, gate roles in a dependency, never trust a request parameter for authorization. Full writeup in multi-tenancy.md.

4. Trusting AI output that has to be valid JSON

Challenge: Quiz generation and pre-grading both depend on the LLM returning well-formed JSON in an exact shape. LLMs don't reliably do that, they wrap output in code fences, add commentary, or drift from the schema.

Solution: Treat AI output as untrusted input. Strip stray code fences defensively, parse with explicit error handling, and validate every field: valid question type, non-empty prompt, exactly one correct option for multiple-choice, scores clamped to range. Malformed output raises a clear error rather than reaching the instructor as garbage.

What I learned: "The model will return JSON" is a hope, not a contract. The validation layer around an LLM call is as important as the prompt. Build it assuming the model will eventually return something malformed, because it will.

5. AI latency that can't block the user

Challenge: Grading short-answer questions with the AI is slow. Doing it synchronously inside the submission request would leave students staring at a spinner.

Solution: Decoupled the flow. Submission persists and confirms immediately; pre-grading runs separately and produces suggestions the instructor reviews later. The student never waits on the AI, and the instructor gets a head start instead of a blank grading screen.

What I learned: Put slow, non-critical AI work off the request's critical path. The student's submission shouldn't depend on an LLM round-trip succeeding, and the instructor's workflow is better served by suggestions-on-arrival than by real-time grading anyway.

6. A provider abstraction worth the upfront cost

Challenge: Wiring 35 service modules directly to Groq's SDK would weld the whole app to one vendor's pricing and uptime.

Solution: An AIProvider abstract base class with one method, send_message. Groq and Anthropic implementations sit behind it, selected by an env var. Vision-model selection and token accounting are handled inside the abstraction so calling code stays provider-agnostic.

What I learned: Abstractions earn their keep when the thing they hide is volatile. LLM pricing, availability, and capabilities change fast. One interface and a one-line env swap is cheap insurance against a vendor problem becoming an app-wide refactor.

SkillBridge AI: Test Prep Academy Platform