Sentinel — Fraud Detection Platform
Production-grade fraud operations platform with calibrated LightGBM scoring at 8.5ms, SHAP explainability on every prediction, and $1.23M in modeled net savings from cost-aware threshold tuning.
Sentinel — Fraud Detection Platform
A full-stack fraud operations platform that scores transactions in 8.5ms, explains every decision with SHAP attributions, and surfaces $1.23M in modeled net savings through cost-aware threshold tuning. Built end-to-end as a single engineer: calibrated machine learning pipeline, multi-tenant FastAPI backend, and a real-time React workspace for fraud analysts.
This is the kind of system I want to build for a living — systems engineering that ships machine learning to users who depend on it.
At a glance
| Test PR-AUC | 0.992 on hidden test set, never used for selection |
| Recall at production threshold | 99.5% fraud detection |
| Single-prediction latency | 8.5ms including SHAP attribution |
| Modeled net savings | $1.23M at cost-optimized threshold |
| Training dataset | 6.36M transactions from PaySim |
| REST API endpoints | 50+ across 14 router modules |
| Database tables | 13 with multi-tenant isolation |
| Tests passing | 40 across backend and ML |
| Frontend pages | 14 with full mobile responsive design |
Why I built this
Fraud detection sits at the exact intersection I want to work in: systems engineering that ships machine learning to users who depend on it. It's one of the hardest applied ML problems in production because three forces are in constant tension:
- Recall vs. precision. Catching more fraud means more false positives, which floods analysts and erodes trust.
- Latency vs. interpretability. Real-time decisions demand fast inference, but every flagged transaction needs an explanation a human can defend.
- Offline performance vs. production reality. Models that look perfect in notebooks fail the moment distribution drift hits.
I built Sentinel to demonstrate that I can hold all three in tension and ship a product that respects each one. Calibrated probabilities so threshold tuning is meaningful. SHAP attributions on every prediction so analysts can defend decisions. Drift monitoring so the system can warn itself when reality stops matching training. A real interface for the humans who use it, not just a notebook output.
Tech stack
| Layer | Technology |
|---|---|
| Backend | Python 3.12, FastAPI 0.115 |
| Database | PostgreSQL 16 with JSONB |
| ORM | SQLAlchemy 2.0 |
| Migrations | Alembic |
| Validation | Pydantic v2 |
| Auth | PyJWT, passlib (bcrypt) |
| ML Model | LightGBM (isotonic calibration) |
| Explainability | SHAP TreeExplainer |
| Experiment tracking | MLflow |
| Data versioning | DVC |
| Frontend | React 19, TypeScript 6, Vite 8 |
| Styling | Tailwind v4 with semantic tokens |
| Routing | React Router v7 |
| Data fetching | TanStack Query 5 |
| State | Zustand 5 |
| Charts | Recharts |
| Maps | react-simple-maps |
| Reverse proxy | Nginx |
| Containers | Docker |
Where to go from here
screenshots.md— full product tour across analyst workflow, MLOps, and admin surfacesarchitecture.md— system diagram, in-process model serving, multi-tenancy by constructionml-deep-dive.md— LightGBM, SHAP, calibration, hidden test set disciplinethreshold-tuning.md— how the $1.23M figure is computed from a real cost modelsecurity.md— defense-in-depth CSV upload pipeline hardened against six attack classesdatabase-design.md— 13-table schema, multi-tenant isolation, JSONB for evolving payloadschallenges.md— the hard decisions: stratified vs. temporal split, aggregate feature ablation, calibration, drift disciplineapi-reference.md— endpoint reference across 14 routerscode/— selected source: score_endpoint.py, auth_dependencies.py, upload_hardening.py, train_pipeline.pylinks/github-repo.url— full source on GitHub