rpmjp/portfolio
rpmjp/projects/sentinel/database-design.md
CompletedOctober 2025 – January 2026

Sentinel — Fraud Detection Platform

Production-grade fraud operations platform with calibrated LightGBM scoring at 8.5ms, SHAP explainability on every prediction, and $1.23M in modeled net savings from cost-aware threshold tuning.

Python 3.12FastAPILightGBMSHAPPostgreSQL 16React 19TypeScriptTailwind v4
Languages
TypeScript56.7%
Python41.6%
CSS1%
Makefile0.4%
JavaScript0.1%
Mako0.1%
HTML0.1%
database-design.md

Database Design

The schema is built around multi-tenant isolation, soft deletes, and JSONB for flexible payloads. Every row that could leak across tenants carries a tenant_id. The design prioritizes data integrity and audit-ability over premature optimization.


Schema

-- Tenancy & identity
tenants            (id, slug UNIQUE, name, created_at)
users              (id, tenant_id FK, email UNIQUE, password_hash, role, created_at)
                   -- role: 'analyst' | 'senior_analyst' | 'admin'

-- Model lifecycle
model_versions     (id, tenant_id FK, name, version, stage, metrics JSONB,
                    threshold, artifact_path, created_at)
                   -- stage: 'production' | 'staging' | 'archived'

-- Scoring & feedback
transactions       (id, tenant_id FK, external_id, step, type, amount,
                    name_orig, old_balance_org, new_balance_org,
                    name_dest, old_balance_dest, new_balance_dest,
                    is_fraud, created_at)
predictions        (id, tenant_id FK, transaction_id FK, model_version_id FK,
                    score, risk_band, explanation JSONB, latency_ms,
                    threshold_at_score, created_at)
analyst_decisions  (id, tenant_id FK, transaction_id FK, user_id FK,
                    decision, rationale, created_at)
                   -- decision: 'confirmed_fraud' | 'false_positive' | 'escalated'

-- Investigation workflow
cases              (id, tenant_id FK, title, description, status, priority,
                    assigned_to FK, sla_due_at, created_by FK,
                    closed_at, outcome, created_at, updated_at)
case_transactions  (case_id FK, transaction_id FK, added_at)
case_entities      (case_id FK, account_id, role, added_at)
case_notes         (id, case_id FK, user_id FK, content, created_at)

-- Watchlists & geographic enrichment
watchlist_entries  (id, tenant_id FK, account_id, list_type, reason, created_at)
                   -- list_type: 'blocked' | 'trusted'
account_geo        (account_id PK, country, country_name, region, city, lat, lon)

-- MLOps & security
drift_snapshots    (id, tenant_id FK, feature_name, psi, baseline_mean,
                    recent_mean, captured_at)
upload_audits      (id, tenant_id FK, user_id FK, filename, file_size,
                    status, rows_total, rows_scored, risk_counts JSONB,
                    error JSONB, created_at)

Key design decisions

tenant_id on every domain row enforces isolation at the query level, not the application layer. A bug in a handler can't cause cross-tenant data leakage because the SQLAlchemy session is already scoped. This is multi-tenancy by construction — you can't accidentally write a query that returns another tenant's data.

JSONB columns for evolving payloads. explanation, metrics, and risk_counts all live in JSONB. SHAP returns a different number of features per model version; metrics include whatever the training run produced; risk distributions change when the threshold moves. JSONB lets the schema absorb all of that without migrations every time the shape changes.

String columns over Postgres ENUM types for status fields. Adding a new case status, decision type, or model stage doesn't require a schema change — just an application-layer update. Migrations stay cleaner; deploys stay simpler.

Soft delete via deleted_at preserves audit history when records leave the UI. A deleted case is hidden from the analyst worklist but the audit log can still reconstruct what happened.

Composite primary keys on join tables (case_transactions, case_entities) prevent duplicate links at the database level. Adding the same transaction to a case twice is impossible — the constraint rejects it.

predictions.threshold_at_score stores the threshold that was active when the prediction was scored. This lets you reproduce historical decisions even after the threshold has been tuned. Without this column, threshold changes would silently invalidate the historical record.