rpmjp/portfolio
rpmjp/projects/communityshield/README.md
CompletedMay – August 2025

CommunityShield

ML-powered crime pattern explorer for Chicago. 8.5M rows, 4 XGBoost models with SHAP explanations, beat-level heatmap, and an honest methodology page about what the data can and cannot tell you.

Python 3.12FastAPIPostgreSQL 16PostGISXGBoostSHAPReact 19MapLibre GL
Languages
TypeScript52.4%
Python41.8%
CSS3.2%
Other2.6%
README.md

CommunityShield

A machine-learning-powered crime pattern explorer for Chicago. 8.5 million records of public safety data, four XGBoost models with SHAP explanations, an interactive beat-level heatmap, and an honest methodology page about what the data can and cannot tell you.

This project deliberately avoids the term predictive policing. It's a tool for community awareness — for residents asking is my neighborhood getting safer, for journalists looking at city-wide patterns, for researchers studying urban crime distribution. The model surfaces structure that already exists in public data; it does not direct enforcement and is not designed to.


At a glance

Dataset8.5M Chicago crime records, 2001 – present
Geographic resolutionBeat-level (police beats), 274 beats citywide
ML models4 XGBoost classifiers + 1 ensemble for predicted-vs-actual hot spots
Hyperparameter tuningOptuna, 100 trials per model
ExplainabilitySHAP TreeExplainer on every prediction
Pre-aggregated rollups7.8M rows across temporal buckets
Query latencySub-100ms for beat-level heatmap
FrontendReact 19 + MapLibre GL + 3-tier rendering fallback

Why I built this

Crime data is one of the most-published open datasets in any American city. Chicago's data portal exposes every reported incident going back to 2001 — over 8 million rows. Most public dashboards built on this data fall into one of two failure modes:

  1. Decoration. Pretty maps with no analytical depth. Pin clusters that tell you nothing you couldn't see by living in the city.
  2. Predictive policing. Models trained on enforcement data that learn to predict enforcement, then get pitched as crime prediction. The feedback loop is well-documented and the harm is real.

I wanted to build the third option: a tool that treats the data with statistical honesty, surfaces real patterns at a meaningful spatial resolution, explains every model output, and is explicit about its limitations on a dedicated ethics page.


Tech stack

LayerTechnology
BackendPython 3.12, FastAPI
DatabasePostgreSQL 16 with PostGIS extension
MLXGBoost 2.1, scikit-learn, Optuna
ExplainabilitySHAP TreeExplainer
FrontendReact 19, TypeScript, Vite
MapsMapLibre GL JS
StylingTailwind CSS
ContainersDocker, docker-compose
CIGitHub Actions

Where to go from here

  • screenshots.md — product tour: heatmap, beat detail, prediction view, methodology dashboard, mobile
  • architecture.md — system diagram, why pre-aggregated rollups, 3-tier map fallback strategy
  • database-design.md — schema, the 7.8M-row rollups, PostGIS spatial indexing
  • ml-deep-dive.md — 4 models, 6 experiments, Optuna tuning, the honest data ceiling finding
  • ethics.md — why this is not predictive policing, what the data does and does not represent
  • challenges.md — 10 named engineering challenges and how I solved each one
  • api-reference.md — REST endpoints across the FastAPI service
  • code/ — selected source: heatmap_endpoint.py, shap_explainer.py, populate_rollups.py, map_fallback.tsx
  • links/github-repo.url — full source on GitHub