CommunityShield

A machine-learning-powered crime pattern explorer for Chicago. 8.5 million records of public safety data, four XGBoost models with SHAP explanations, an interactive beat-level heatmap, and an honest methodology page about what the data can and cannot tell you.

This project deliberately avoids the term predictive policing. It's a tool for community awareness: for residents asking is my neighborhood getting safer, for journalists looking at city-wide patterns, for researchers studying urban crime distribution. The model surfaces structure that already exists in public data; it does not direct enforcement and is not designed to.

At a glance


Dataset	8.5M Chicago crime records, 2001 to present
Geographic resolution	Beat-level (police beats), 274 beats citywide
ML models	4 XGBoost classifiers + 1 ensemble for predicted-vs-actual hot spots
Hyperparameter tuning	Optuna, 100 trials per model
Explainability	SHAP TreeExplainer on every prediction
Pre-aggregated rollups	7.8M rows across temporal buckets
Query latency	Sub-100ms for beat-level heatmap
Frontend	React 19 + MapLibre GL + 3-tier rendering fallback

Why I built this

Crime data is one of the most-published open datasets in any American city. Chicago's data portal exposes every reported incident going back to 2001: over 8 million rows. Most public dashboards built on this data fall into one of two failure modes:

Decoration. Pretty maps with no analytical depth. Pin clusters that tell you nothing you couldn't see by living in the city.
Predictive policing. Models trained on enforcement data that learn to predict enforcement, then get pitched as crime prediction. The feedback loop is well-documented and the harm is real.

I wanted to build the third option: a tool that treats the data with statistical honesty, surfaces real patterns at a meaningful spatial resolution, explains every model output, and is explicit about its limitations on a dedicated ethics page.

Tech stack

Layer	Technology
Backend	Python 3.12, FastAPI
Database	PostgreSQL 16 with PostGIS extension
ML	XGBoost 2.1, scikit-learn, Optuna
Explainability	SHAP TreeExplainer
Frontend	React 19, TypeScript, Vite
Maps	MapLibre GL JS
Styling	Tailwind CSS
Containers	Docker, docker-compose
CI	GitHub Actions

Where to go from here

screenshots.md: product tour: heatmap, beat detail, prediction view, methodology dashboard, mobile
architecture.md: system diagram, why pre-aggregated rollups, 3-tier map fallback strategy
database-design.md: schema, the 7.8M-row rollups, PostGIS spatial indexing
ml-deep-dive.md: 4 models, 6 experiments, Optuna tuning, the honest data ceiling finding
ethics.md: why this is not predictive policing, what the data does and does not represent
challenges.md: 10 named engineering challenges and how I solved each one
api-reference.md: REST endpoints across the FastAPI service
code/: selected source: heatmap_endpoint.py, shap_explainer.py, populate_rollups.py, map_fallback.tsx
links/github-repo.url: full source on GitHub