Architecture

Sentinel is built around a clean separation between the offline ML pipeline, an in-process scoring service, and a real-time React workspace. Postgres is the single source of truth for tenants, transactions, predictions, cases, and audit events. The model is loaded once at API startup and served in-process for sub-10ms scoring latency.

System diagram

┌─────────────────────────────────────────────────────────┐
│                      Client Layer                       │
│            Desktop  /  Tablet  /  Mobile                │
│      (React 19 + TypeScript + Tailwind + Vite)          │
└────────────────┬────────────────────┬───────────────────┘
                 │                    │
                 ▼                    ▼
┌────────────────────────┐  ┌────────────────────────┐
│   Analyst Workspace    │  │     Admin Surface      │
│   /dashboard           │  │   /models              │
│   /queue               │  │   /tuner               │
│   /transactions/:id    │  │   /drift               │
│   /entities/:id        │  │   /settings            │
│   /investigate         │  │   /audit               │
│   /cases               │  │                        │
│   /upload              │  │                        │
└───────────┬────────────┘  └───────────┬────────────┘
            │                           │
            └─────────────┬─────────────┘
                          ▼
┌─────────────────────────────────────────────────────────┐
│                  Authentication Layer                   │
│   JWT bearer tokens (PyJWT + passlib bcrypt)            │
│   Role-based dependencies: analyst, senior, admin       │
│   Multi-tenant isolation via tenant_id on every row     │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│              FastAPI Router Layer                       │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐       │
│  │ scoring │ │  queue  │ │  cases  │ │investigate│      │
│  └─────────┘ └─────────┘ └─────────┘ └──────────┘       │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐       │
│  │dashboard│ │ entities│ │ upload  │ │  replay  │       │
│  └─────────┘ └─────────┘ └─────────┘ └──────────┘       │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐       │
│  │  drift  │ │  tuner  │ │ models  │ │watchlists│       │
│  └─────────┘ └─────────┘ └─────────┘ └──────────┘       │
└──────────┬──────────────────────────┬───────────────────┘
           │                          │
           ▼                          ▼
┌──────────────────────┐  ┌──────────────────────────┐
│   PostgreSQL 16      │  │   ML Service (in-proc)   │
│                      │  │                          │
│  13 relational       │  │   LightGBM (calibrated)  │
│  tables with         │  │   SHAP TreeExplainer     │
│  multi-tenant        │  │   8.5ms per prediction   │
│  soft delete and     │  │                          │
│  JSONB for SHAP      │  │   MLflow tracking        │
│  explanations        │  │   DVC for data version   │
└──────────────────────┘  └──────────────────────────┘

Why this architecture

In-process model serving avoids a network hop on the hot path. The model loads once at FastAPI startup via a lifespan context manager and serves every /score request with the same explainer instance. The latency budget for a scored transaction — including the SHAP attribution — is 8.5ms p50. Running the model as a separate microservice would have added a round trip to every prediction without any operational benefit at this scale.

Multi-tenant by construction. Every domain table carries tenant_id. Cross-tenant access is impossible at the query level, not just the application layer. The tenant scope is enforced by SQLAlchemy dependency injection — every query that touches tenant-scoped data goes through a session that already filters by the authenticated user's tenant.

JSONB for evolving payloads. SHAP explanations, model metrics, and upload risk distributions all live in JSONB columns. The schema evolves without migration churn when the model adds a feature or the metrics format changes.

DVC for data, Git for code. The 471 MB PaySim CSV is versioned via DVC alongside the model artifact. Anyone cloning the repo can dvc pull and reproduce the exact training data. Git tracks the pipeline that processes the data; DVC tracks the data itself.

JWT bearer tokens, not sessions. Stateless authentication makes the API horizontally scalable without sticky sessions. Tokens carry the tenant ID, user ID, and role — every protected endpoint validates the token via a FastAPI dependency, so the auth check happens before any handler code runs.

Sentinel — Fraud Detection Platform

Architecture

System diagram

Why this architecture