rpmjp/portfolio
rpmjp/projects/robi/monitoring.md
Completed2026

Robi: production RAG assistant

Retrieval-augmented chatbot with hybrid search, guardrails, eval, and live monitoring.

LIVE DEMO
FastAPIPythonPostgrespgvectorRedisGroqPrometheusGrafanaDocker
monitoring.md

Monitoring

Robi has monitoring because public AI systems need more than a green deploy. The system tracks whether it is fast, whether it is refusing too much, whether errors are isolated to one component, and how much generation is costing.

Metrics collected

Prometheus scrapes the backend every 15 seconds. The dashboard tracks:

  • Request volume
  • Request outcome
  • p50, p95, and p99 latency
  • Per-stage timing
  • Refusal rate
  • Provider errors
  • Retrieval errors
  • Cumulative cost
  • Errors by component

Grafana as code

Grafana datasource, dashboard, and alert rules are provisioned as code. Rebuilding the stack recreates the same monitoring surface instead of relying on manual dashboard setup.

What I watch

Latency: RAG has multiple stages, so total latency alone is not enough. Per-stage timing shows whether embedding, retrieval, reranking, generation, or provider response time is the bottleneck.

Refusal rate: A sudden jump can mean the retrieval threshold is too strict, the corpus changed, or the retriever is failing. A sudden drop can be worse because it may mean Robi is answering when it should refuse.

Cost: Generation is external, so cost is part of operations. Tracking cumulative cost makes unexpected traffic visible.

Component errors: Splitting errors by component makes failures actionable. A provider outage, Redis issue, retrieval failure, and app exception should not all look the same.