Model Performance

Model performance analysis and runtime monitoring. Import from fair_perf_ml.model_perf.

Batch analysis

fair_perf_ml.model_perf.core.binary_classification_analysis

binary_classification_analysis(y_true: FloatingPointDataSlice, y_pred: FloatingPointDataSlice) -> dict

Analysis for a classification model type. Will apply labels in accordance with the given threshold args: y_true: numpy array/python list y_pred: numpy array/python list returns: dict: analysis output

fair_perf_ml.model_perf.core.logistic_regression_analysis

logistic_regression_analysis(y_true: FloatingPointDataSlice, y_pred: FloatingPointDataSlice, decision_threshold: float | None = 0.5) -> dict

Analysis for a logistic regression model type. Will apply labels in accordance with the given threshold args: y_true: numpy array/python list y_pred: numpy array/python list decision_threshold: threshold used to apply label returns: dict: analysis output

fair_perf_ml.model_perf.core.linear_regression_analysis

linear_regression_analysis(y_true: FloatingPointDataSlice, y_pred: FloatingPointDataSlice) -> dict

Analysis for a linear regression model type. Will apply labels in accordance with the given threshold args: y_true: numpy array/python list y_pred: numpy array/python list returns: dict: analysis output

Runtime comparison

fair_perf_ml.model_perf.core.runtime_check_full

runtime_check_full(latest: ModelPerformanceReport | dict, baseline: ModelPerformanceReport | dict, threshold: float = 0.1) -> DriftReport

Method to perform a full runtime performance monitoring job. args: latest: dict - latest analysis output, must match shape baseline: dict - baseline analysis output, must match shape threshold: Optional[float] - the allowable drift returns: dict - output analysis

fair_perf_ml.model_perf.core.partial_runtime_check

partial_runtime_check(latest: ModelPerformanceReport | dict, baseline: ModelPerformanceReport | dict, metrics: list[ModelPerformanceDriftMetric], threshold: float = 0.1) -> DriftReport

Method to perform a runtime performance monitoring job on only selected metrics. Allows for the drift report to only consider the metrics that are relevant to an application. Often times not all loss metrics are relevant. args: latest: dict - latest analysis output, must match shape baseline: dict - baseline analysis output, must match shape threshold: Optional[float] - the allowable drift returns: dict - output analysis

Streaming

fair_perf_ml.model_perf.streaming.BinaryClassificationStreaming

Bases: LabeledStreamingBase[T]

Stateful streaming monitor for binary classification models.

Maintains a baseline computed from an initial dataset. As new predictions arrive they are accumulated in the stream and can be evaluated against the baseline at any point via snapshots or drift reports.

The positive class label can be any type that supports equality comparison. Labels are applied in Python before being accumulated in the monitor.

init

__init__(label: T, y_true: Iterable[T], y_pred: Iterable[T]) -> None

Initialises the monitor with a positive class label and a baseline dataset. args: label: T - the positive class label y_true: Sequence[T] - baseline ground truth values y_pred: Sequence[T] - baseline prediction values

update_stream

update_stream(y_true: T, y_pred: T) -> None

Accumulate a single ground truth and prediction example into the stream. args: y_true: T - ground truth value y_pred: T - prediction value

update_stream_batch

update_stream_batch(y_true: Iterable[T], y_pred: Iterable[T]) -> None

Accumulate a batch of ground truth and prediction examples into the stream. args: y_true: Sequence[T] - ground truth values y_pred: Sequence[T] - prediction values

flush

flush() -> None

Discard all accumulated runtime data. The baseline is preserved.

reset_baseline

reset_baseline(y_true: Iterable[T], y_pred: Iterable[T]) -> None

Replace the baseline with a new dataset. The positive label is unchanged. To change the label at the same time use reset_baseline_and_label. args: y_true: Sequence[T] - new baseline ground truth values y_pred: Sequence[T] - new baseline prediction values

reset_baseline_and_label

reset_baseline_and_label(label: T, y_true: Iterable[T], y_pred: Iterable[T]) -> None

Replace the baseline and the positive class label simultaneously. Because changing the label invalidates all previously accumulated state, a new baseline dataset is required. This is the only method that allows the label to be changed. args: label: T - new positive class label y_true: Sequence[T] - new baseline ground truth values y_pred: Sequence[T] - new baseline prediction values

performance_snapshot

performance_snapshot() -> PerformanceSnapshot

Compute a point-in-time performance report over all accumulated runtime examples, independent of the baseline. Raises if no runtime data has been accumulated yet. returns: PerformanceSnapshot - metric name to value mapping

drift_snapshot

drift_snapshot() -> DriftSnapshot

Compute a point-in-time drift snapshot comparing accumulated runtime examples to the baseline using a default threshold. Raises if no runtime data has been accumulated yet. returns: DriftSnapshot - metric name to drift value mapping

drift_report

drift_report(drift_threshold: float) -> DriftReport

Evaluate whether accumulated runtime performance has drifted beyond the given threshold relative to the baseline. Raises if no runtime data has been accumulated yet. args: drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics

drift_report_partial_metrics

drift_report_partial_metrics(metrics: list[ClassificationDriftMetric], drift_threshold: float) -> DriftReport

Same as drift_report but scoped to a specific subset of metrics. args: metrics: list[ClassificationDriftMetric] - metrics to evaluate drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics

fair_perf_ml.model_perf.streaming.LogisticRegressionStreaming

Bases: ModelPerfStreamingBase[float]

Stateful streaming monitor for logistic regression models.

Maintains a baseline computed from an initial dataset. As new predictions arrive they are accumulated in the stream and can be evaluated against the baseline at any point via snapshots or drift reports. A decision threshold is applied to predicted probabilities to produce binary labels.

init

__init__(y_true: Iterable[float], y_pred: Iterable[float], threshold: float | None = 0.5) -> None

Initialises the monitor with a baseline dataset and a decision threshold. args: y_true: Sequence[float] - baseline ground truth values y_pred: Sequence[float] - baseline predicted probabilities threshold: float | None - decision threshold applied to probabilities, defaults to 0.5

update_stream

update_stream(y_true: float, y_pred: float) -> None

Accumulate a single ground truth and predicted probability into the stream. args: y_true: float y_pred: float - predicted probability

update_stream_batch

update_stream_batch(y_true: Iterable[float], y_pred: Iterable[float]) -> None

Accumulate a batch of ground truth values and predicted probabilities into the stream. args: y_true: Sequence[float] y_pred: Sequence[float] - predicted probabilities

flush

flush() -> None

Discard all accumulated runtime data. The baseline is preserved.

reset_baseline

reset_baseline(y_true: Iterable[float], y_pred: Iterable[float]) -> None

Replace the baseline with a new dataset. The decision threshold is unchanged. To change the threshold at the same time use reset_baseline_and_decision_threshold. args: y_true: Sequence[float] - new baseline ground truth values y_pred: Sequence[float] - new baseline predicted probabilities

reset_baseline_and_decision_threshold

reset_baseline_and_decision_threshold(y_true: Iterable[float], y_pred: Iterable[float], threshold: float) -> None

Replace the baseline and the decision threshold simultaneously. Because changing the threshold invalidates previously computed baseline statistics, a new baseline dataset is required. This is the only method that allows the threshold to be changed. args: y_true: Sequence[float] - new baseline ground truth values y_pred: Sequence[float] - new baseline predicted probabilities threshold: float - new decision threshold

performance_snapshot

performance_snapshot() -> PerformanceSnapshot

Compute a point-in-time performance report over all accumulated runtime examples, independent of the baseline. Raises if no runtime data has been accumulated yet. returns: PerformanceSnapshot - metric name to value mapping

drift_snapshot

drift_snapshot() -> DriftSnapshot

Compute a point-in-time drift snapshot comparing accumulated runtime examples to the baseline using a default threshold. Raises if no runtime data has been accumulated yet. returns: DriftSnapshot - metric name to drift value mapping

drift_report

drift_report(drift_threshold: float) -> DriftReport

Evaluate whether accumulated runtime performance has drifted beyond the given threshold relative to the baseline. Raises if no runtime data has been accumulated yet. args: drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics

drift_report_partial_metrics

drift_report_partial_metrics(metrics: list[ClassificationDriftMetric], drift_threshold: float) -> DriftReport

Same as drift_report but scoped to a specific subset of metrics. args: metrics: list[ClassificationDriftMetric] - metrics to evaluate drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics

fair_perf_ml.model_perf.streaming.LinearRegressionStreaming

Bases: ModelPerfStreamingBase[float]

Stateful streaming monitor for linear regression models.

Maintains a baseline computed from an initial dataset. As new predictions arrive they are accumulated in the stream and can be evaluated against the baseline at any point via snapshots or drift reports.

init

__init__(y_true: Iterable[float], y_pred: Iterable[float]) -> None

Initialises the monitor with a baseline dataset. args: y_true: Sequence[float] - baseline ground truth values y_pred: Sequence[float] - baseline prediction values

update_stream

update_stream(y_true: float, y_pred: float) -> None

Accumulate a single ground truth and prediction example into the stream. args: y_true: float y_pred: float

update_stream_batch

update_stream_batch(y_true: Iterable[float], y_pred: Iterable[float]) -> None

Accumulate a batch of ground truth and prediction examples into the stream. args: y_true: Sequence[float] y_pred: Sequence[float]

reset_baseline

reset_baseline(y_true: Iterable[float], y_pred: Iterable[float]) -> None

Replace the baseline with a new dataset. args: y_true: Sequence[float] - new baseline ground truth values y_pred: Sequence[float] - new baseline prediction values

flush

flush() -> None

Discard all accumulated runtime data. The baseline is preserved.

performance_snapshot

performance_snapshot() -> PerformanceSnapshot

Compute a point-in-time performance report over all accumulated runtime examples, independent of the baseline. Raises if no runtime data has been accumulated yet. returns: PerformanceSnapshot - metric name to value mapping

drift_snapshot

drift_snapshot() -> DriftSnapshot

Compute a point-in-time drift snapshot comparing accumulated runtime examples to the baseline using a default threshold. Raises if no runtime data has been accumulated yet. returns: DriftSnapshot - metric name to drift value mapping

drift_report

drift_report(drift_threshold: float) -> DriftReport

Evaluate whether accumulated runtime performance has drifted beyond the given threshold relative to the baseline. Raises if no runtime data has been accumulated yet. args: drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics

drift_report_partial_metrics

drift_report_partial_metrics(metrics: list[LinearRegressionDriftMetric], drift_threshold: float) -> DriftReport

Same as drift_report but scoped to a specific subset of metrics. args: metrics: list[LinearRegressionDriftMetric] - metrics to evaluate drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics

Exceptions

fair_perf_ml.model_perf.core.DifferentModelTypes

Bases: Exception

Exception to handle when user passes in wrong model type.

fair_perf_ml.model_perf.core.InvalidMetricsBody

Bases: Exception

Exception to handle when the user passes an invalid metrics payload.