CommunityShield
ML-powered crime pattern explorer for Chicago. 8.5M rows, 4 XGBoost models with SHAP explanations, beat-level heatmap, and an honest methodology page about what the data can and cannot tell you.
Architecture
CommunityShield is built around three core ideas: pre-aggregated rollup tables for query speed, in-process model serving for SHAP latency, and a 3-tier map rendering fallback so the frontend works on every device, even when WebGL is unavailable.
System diagram
┌─────────────────────────────────────────────────────────┐
│ Client Layer │
│ Desktop / Tablet / Mobile │
│ (React 19 + TypeScript + Vite + Tailwind) │
└────────────────┬────────────────────┬───────────────────┘
│ │
▼ ▼
┌────────────────────────┐ ┌────────────────────────┐
│ Heatmap & Explorer │ │ Methodology Surface │
│ / │ │ /methodology │
│ /beats/:id │ │ /ethics │
│ /predict/:beat │ │ │
└───────────┬────────────┘ └───────────┬────────────┘
│ │
└─────────────┬─────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ FastAPI Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ heatmap │ │ beats │ │ predict │ │ shap │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ trends │ │ methods │ │
│ └──────────┘ └──────────┘ │
└──────────┬──────────────────────────┬───────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐
│ PostgreSQL 16 │ │ ML Service (in-proc) │
│ + PostGIS │ │ │
│ │ │ 4 XGBoost models │
│ 8.5M raw records │ │ SHAP TreeExplainer │
│ 7.8M rollup rows │ │ │
│ 274 beat polygons │ │ Loaded once at startup │
│ GIST spatial idx │ │ │
└──────────────────────┘ └──────────────────────────┘
Why pre-aggregated rollups
The raw crime table has 8.5M rows. Computing a citywide heatmap from raw data on every request would mean scanning the full table, grouping by beat, filtering by time window, counting per category. Even with indexes, the query latency is unworkable for an interactive map.
The rollup tables solve this by precomputing the aggregations at write time. Every beat has hourly, daily, weekly, and monthly rollup rows for each crime category. The heatmap query becomes a single indexed lookup against the appropriate rollup table — sub-100ms for the full city view.
The tradeoff is storage: 8.5M raw rows generate 7.8M rollup rows across all temporal buckets. Disk is cheap; user-perceived latency isn't.
In-process model serving
The four XGBoost models and their SHAP TreeExplainers load into the FastAPI process at startup via a lifespan context manager. Every /predict request scores in-process — no network hop, no model load overhead per request.
SHAP attribution on a single prediction is the expensive operation, not the prediction itself. Loading the explainer once at startup amortizes that cost across the lifetime of the process.
3-tier map rendering fallback
MapLibre GL needs WebGL. Some users — corporate browsers, older devices, accessibility tooling — don't have WebGL. The frontend ships three rendering paths in priority order:
- MapLibre GL (WebGL) — full interactive vector map, default path
- Static SVG choropleth — every beat polygon rendered server-side as SVG, no JavaScript map library required
- Data table fallback — sortable ranked list of beats with the same data, fully accessible
The frontend detects WebGL capability on mount and selects the highest tier the browser supports. See code/map_fallback.tsx for the detection logic.
PostGIS for spatial queries
Beat polygons live in a geometry(Polygon, 4326) column with a GIST spatial index. Point-in-polygon lookups (assigning a new crime record to a beat by lat/lon) run in milliseconds even against the full 274-polygon set. The ETL pipeline uses these lookups to backfill the beat_id foreign key on every raw incident at ingest time.