Fair-Perf-ML — Real-Time Observability Roadmap

Overview

Fair-Perf-ML is a lightweight, composable observability toolkit for machine-learning systems.
It provides a unified API for computing performance, fairness, and soon drift metrics in both batch and streaming settings.

The next major phase of the project focuses on real-time drift detection and online monitoring — building the bridge between offline evaluation and live production observability.

✨ Vision

Enable engineers to detect model degradation, bias, or data drift before ground truth arrives.
Fair-Perf-ML will support real-time metric computation over streaming or micro-batch data, with plug-and-play integration for existing ML pipelines (SageMaker, Airflow, Kafka, Kinesis, etc.).

🧩 Core Goals

Unified metric API — same interface for performance, fairness, and drift metrics.
Real-time drift monitoring — lightweight statistical proxies for early detection.
Streaming-friendly design — incremental metrics, windowed evaluation, low memory.
Fairness-aware drift — group-level monitoring for ethical robustness.
Interoperability — easy export to OLAP, Prometheus, CloudWatch, or S3 JSON logs.

Roadmap

Phase 1 – Drift Metrics

Goal: introduce a first-class fair_perf_ml.drift module with standard metrics.

Features

[X] PopulationStabilityIndex (PSI) — categorical + numeric.
[X] KSTestDrift — two-sample Kolmogorov–Smirnov test.
[X] JensenShannonDrift — symmetric divergence [0, 1].
[ ] WassersteinDrift — Earth-Mover’s distance for continuous features.
[X] Unified base class BaseDriftMetric with compute(baseline, current).

Deliverables

[X] Comprehensive unit tests for numeric / categorical inputs.
[X] Serialization for baseline histograms (.json).

Phase 2 – Real-Time Monitoring Engine

Goal: make drift and fairness metrics stream-aware and incremental.

Features

[X] DriftMonitor class for rolling window updates.
[ ] Online statistics via Welford’s algorithm / T-Digest summaries.
[ ] Config-driven thresholds and per-feature policies.
[ ] Aggregation & alert logic (e.g., drift detected in ≥ K % of features).
[ ] Async / background execution hooks.

Deliverables

[ ] Example micro-batch monitor (5-min window) for Kafka/Kinesis.
[ ] CLI & YAML config interface.
[ ] Integration guide: “Using Fair-Perf-ML in a streaming service”.

Phase 3 – Fairness Drift & Group Analytics

Goal: extend drift detection to subgroup and fairness metrics.

Features

[ ] Group-level PSI / JSD (e.g., PSI(ŷ | gender)).
[ ] Fairness delta tracking (Δ FPR, Δ TPR, Δ PPV over time).
[ ] Group-specific drift alerts.
[ ] Fairness-drift visualization templates.

Deliverables

[ ] Example notebook: “Monitoring fairness drift in real time”.
[ ] API design doc for fair_perf_ml.fairness.monitor.

Phase 4 – Observability Integrations (Q3 2026)

Goal: make Fair-Perf-ML metrics observable and interoperable with production systems.

Features

[ ] Exporters for:
Prometheus Pushgateway
AWS CloudWatch
OpenTelemetry Metrics API
S3 / DynamoDB JSON logs
[ ] MetricPublisher abstraction layer.
[ ] Prebuilt Grafana dashboards for PSI, drift scores, fairness deltas.

Deliverables

[ ] Example Docker sidecar for model endpoints.
[ ] fair_perf_ml.client module with async publishers.
[ ] Integration tests with SageMaker & local streaming setup.

Phase 5 – Continuous Evaluation & Ground-Truth Integration

Goal: combine real-time proxy metrics with delayed ground-truth metrics.

Features

[ ] Join inference logs with delayed labels for backfilled evaluation.
[ ] Compare leading (proxy) vs lagging (ground-truth) metrics.
[ ] Automatic model performance dashboards (F1, AUC, Recall drift).
[ ] Correlation tracking between proxy drift and true performance drop.

Fair-Perf-ML — Real-Time Observability Roadmap

Overview

✨ Vision

🧩 Core Goals

Roadmap

**Phase 1 – Drift Metrics **

Features

Deliverables

**Phase 2 – Real-Time Monitoring Engine **

Features

Deliverables

**Phase 3 – Fairness Drift & Group Analytics **

Features

Deliverables

Phase 4 – Observability Integrations (Q3 2026)

Features

Deliverables

**Phase 5 – Continuous Evaluation & Ground-Truth Integration **

Features

🧠 Architecture Summary

Phase 1 – Drift Metrics

Phase 2 – Real-Time Monitoring Engine

Phase 3 – Fairness Drift & Group Analytics

Phase 5 – Continuous Evaluation & Ground-Truth Integration