CommunityShield
ML-powered crime pattern explorer for Chicago. 8.5M rows, 4 XGBoost models with SHAP explanations, beat-level heatmap, and an honest methodology page about what the data can and cannot tell you.
CommunityShield
A machine-learning-powered crime pattern explorer for Chicago. 8.5 million records of public safety data, four XGBoost models with SHAP explanations, an interactive beat-level heatmap, and an honest methodology page about what the data can and cannot tell you.
This project deliberately avoids the term predictive policing. It's a tool for community awareness — for residents asking is my neighborhood getting safer, for journalists looking at city-wide patterns, for researchers studying urban crime distribution. The model surfaces structure that already exists in public data; it does not direct enforcement and is not designed to.
At a glance
| Dataset | 8.5M Chicago crime records, 2001 – present |
| Geographic resolution | Beat-level (police beats), 274 beats citywide |
| ML models | 4 XGBoost classifiers + 1 ensemble for predicted-vs-actual hot spots |
| Hyperparameter tuning | Optuna, 100 trials per model |
| Explainability | SHAP TreeExplainer on every prediction |
| Pre-aggregated rollups | 7.8M rows across temporal buckets |
| Query latency | Sub-100ms for beat-level heatmap |
| Frontend | React 19 + MapLibre GL + 3-tier rendering fallback |
Why I built this
Crime data is one of the most-published open datasets in any American city. Chicago's data portal exposes every reported incident going back to 2001 — over 8 million rows. Most public dashboards built on this data fall into one of two failure modes:
- Decoration. Pretty maps with no analytical depth. Pin clusters that tell you nothing you couldn't see by living in the city.
- Predictive policing. Models trained on enforcement data that learn to predict enforcement, then get pitched as crime prediction. The feedback loop is well-documented and the harm is real.
I wanted to build the third option: a tool that treats the data with statistical honesty, surfaces real patterns at a meaningful spatial resolution, explains every model output, and is explicit about its limitations on a dedicated ethics page.
Tech stack
| Layer | Technology |
|---|---|
| Backend | Python 3.12, FastAPI |
| Database | PostgreSQL 16 with PostGIS extension |
| ML | XGBoost 2.1, scikit-learn, Optuna |
| Explainability | SHAP TreeExplainer |
| Frontend | React 19, TypeScript, Vite |
| Maps | MapLibre GL JS |
| Styling | Tailwind CSS |
| Containers | Docker, docker-compose |
| CI | GitHub Actions |
Where to go from here
screenshots.md— product tour: heatmap, beat detail, prediction view, methodology dashboard, mobilearchitecture.md— system diagram, why pre-aggregated rollups, 3-tier map fallback strategydatabase-design.md— schema, the 7.8M-row rollups, PostGIS spatial indexingml-deep-dive.md— 4 models, 6 experiments, Optuna tuning, the honest data ceiling findingethics.md— why this is not predictive policing, what the data does and does not representchallenges.md— 10 named engineering challenges and how I solved each oneapi-reference.md— REST endpoints across the FastAPI servicecode/— selected source: heatmap_endpoint.py, shap_explainer.py, populate_rollups.py, map_fallback.tsxlinks/github-repo.url— full source on GitHub