Model Performance
Model performance analysis and runtime monitoring. Import from fair_perf_ml.model_perf.
Batch analysis
fair_perf_ml.model_perf.core.binary_classification_analysis
binary_classification_analysis(y_true: FloatingPointDataSlice, y_pred: FloatingPointDataSlice) -> dict
Analysis for a classification model type. Will apply labels in accordance with the given threshold args: y_true: numpy array/python list y_pred: numpy array/python list returns: dict: analysis output
fair_perf_ml.model_perf.core.logistic_regression_analysis
logistic_regression_analysis(y_true: FloatingPointDataSlice, y_pred: FloatingPointDataSlice, decision_threshold: float | None = 0.5) -> dict
Analysis for a logistic regression model type. Will apply labels in accordance with the given threshold args: y_true: numpy array/python list y_pred: numpy array/python list decision_threshold: threshold used to apply label returns: dict: analysis output
fair_perf_ml.model_perf.core.linear_regression_analysis
Analysis for a linear regression model type. Will apply labels in accordance with the given threshold args: y_true: numpy array/python list y_pred: numpy array/python list returns: dict: analysis output
Runtime comparison
fair_perf_ml.model_perf.core.runtime_check_full
runtime_check_full(latest: ModelPerformanceReport | dict, baseline: ModelPerformanceReport | dict, threshold: float = 0.1) -> DriftReport
Method to perform a full runtime performance monitoring job. args: latest: dict - latest analysis output, must match shape baseline: dict - baseline analysis output, must match shape threshold: Optional[float] - the allowable drift returns: dict - output analysis
fair_perf_ml.model_perf.core.partial_runtime_check
partial_runtime_check(latest: ModelPerformanceReport | dict, baseline: ModelPerformanceReport | dict, metrics: list[ModelPerformanceDriftMetric], threshold: float = 0.1) -> DriftReport
Method to perform a runtime performance monitoring job on only selected metrics. Allows for the drift report to only consider the metrics that are relevant to an application. Often times not all loss metrics are relevant. args: latest: dict - latest analysis output, must match shape baseline: dict - baseline analysis output, must match shape threshold: Optional[float] - the allowable drift returns: dict - output analysis
Streaming
fair_perf_ml.model_perf.streaming.BinaryClassificationStreaming
Bases: LabeledStreamingBase[T]
Stateful streaming monitor for binary classification models.
Maintains a baseline computed from an initial dataset. As new predictions arrive they are accumulated in the stream and can be evaluated against the baseline at any point via snapshots or drift reports.
The positive class label can be any type that supports equality comparison. Labels are applied in Python before being accumulated in the monitor.
__init__
Initialises the monitor with a positive class label and a baseline dataset. args: label: T - the positive class label y_true: Sequence[T] - baseline ground truth values y_pred: Sequence[T] - baseline prediction values
update_stream
Accumulate a single ground truth and prediction example into the stream. args: y_true: T - ground truth value y_pred: T - prediction value
update_stream_batch
Accumulate a batch of ground truth and prediction examples into the stream. args: y_true: Sequence[T] - ground truth values y_pred: Sequence[T] - prediction values
reset_baseline
Replace the baseline with a new dataset. The positive label is unchanged. To change the label at the same time use reset_baseline_and_label. args: y_true: Sequence[T] - new baseline ground truth values y_pred: Sequence[T] - new baseline prediction values
reset_baseline_and_label
Replace the baseline and the positive class label simultaneously. Because changing the label invalidates all previously accumulated state, a new baseline dataset is required. This is the only method that allows the label to be changed. args: label: T - new positive class label y_true: Sequence[T] - new baseline ground truth values y_pred: Sequence[T] - new baseline prediction values
performance_snapshot
Compute a point-in-time performance report over all accumulated runtime examples, independent of the baseline. Raises if no runtime data has been accumulated yet. returns: PerformanceSnapshot - metric name to value mapping
drift_snapshot
Compute a point-in-time drift snapshot comparing accumulated runtime examples to the baseline using a default threshold. Raises if no runtime data has been accumulated yet. returns: DriftSnapshot - metric name to drift value mapping
drift_report
Evaluate whether accumulated runtime performance has drifted beyond the given threshold relative to the baseline. Raises if no runtime data has been accumulated yet. args: drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics
drift_report_partial_metrics
drift_report_partial_metrics(metrics: list[ClassificationDriftMetric], drift_threshold: float) -> DriftReport
Same as drift_report but scoped to a specific subset of metrics. args: metrics: list[ClassificationDriftMetric] - metrics to evaluate drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics
fair_perf_ml.model_perf.streaming.LogisticRegressionStreaming
Bases: ModelPerfStreamingBase[float]
Stateful streaming monitor for logistic regression models.
Maintains a baseline computed from an initial dataset. As new predictions arrive they are accumulated in the stream and can be evaluated against the baseline at any point via snapshots or drift reports. A decision threshold is applied to predicted probabilities to produce binary labels.
__init__
Initialises the monitor with a baseline dataset and a decision threshold. args: y_true: Sequence[float] - baseline ground truth values y_pred: Sequence[float] - baseline predicted probabilities threshold: float | None - decision threshold applied to probabilities, defaults to 0.5
update_stream
Accumulate a single ground truth and predicted probability into the stream. args: y_true: float y_pred: float - predicted probability
update_stream_batch
Accumulate a batch of ground truth values and predicted probabilities into the stream. args: y_true: Sequence[float] y_pred: Sequence[float] - predicted probabilities
reset_baseline
Replace the baseline with a new dataset. The decision threshold is unchanged. To change the threshold at the same time use reset_baseline_and_decision_threshold. args: y_true: Sequence[float] - new baseline ground truth values y_pred: Sequence[float] - new baseline predicted probabilities
reset_baseline_and_decision_threshold
reset_baseline_and_decision_threshold(y_true: Iterable[float], y_pred: Iterable[float], threshold: float) -> None
Replace the baseline and the decision threshold simultaneously. Because changing the threshold invalidates previously computed baseline statistics, a new baseline dataset is required. This is the only method that allows the threshold to be changed. args: y_true: Sequence[float] - new baseline ground truth values y_pred: Sequence[float] - new baseline predicted probabilities threshold: float - new decision threshold
performance_snapshot
Compute a point-in-time performance report over all accumulated runtime examples, independent of the baseline. Raises if no runtime data has been accumulated yet. returns: PerformanceSnapshot - metric name to value mapping
drift_snapshot
Compute a point-in-time drift snapshot comparing accumulated runtime examples to the baseline using a default threshold. Raises if no runtime data has been accumulated yet. returns: DriftSnapshot - metric name to drift value mapping
drift_report
Evaluate whether accumulated runtime performance has drifted beyond the given threshold relative to the baseline. Raises if no runtime data has been accumulated yet. args: drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics
drift_report_partial_metrics
drift_report_partial_metrics(metrics: list[ClassificationDriftMetric], drift_threshold: float) -> DriftReport
Same as drift_report but scoped to a specific subset of metrics. args: metrics: list[ClassificationDriftMetric] - metrics to evaluate drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics
fair_perf_ml.model_perf.streaming.LinearRegressionStreaming
Bases: ModelPerfStreamingBase[float]
Stateful streaming monitor for linear regression models.
Maintains a baseline computed from an initial dataset. As new predictions arrive they are accumulated in the stream and can be evaluated against the baseline at any point via snapshots or drift reports.
__init__
Initialises the monitor with a baseline dataset. args: y_true: Sequence[float] - baseline ground truth values y_pred: Sequence[float] - baseline prediction values
update_stream
Accumulate a single ground truth and prediction example into the stream. args: y_true: float y_pred: float
update_stream_batch
Accumulate a batch of ground truth and prediction examples into the stream. args: y_true: Sequence[float] y_pred: Sequence[float]
reset_baseline
Replace the baseline with a new dataset. args: y_true: Sequence[float] - new baseline ground truth values y_pred: Sequence[float] - new baseline prediction values
performance_snapshot
Compute a point-in-time performance report over all accumulated runtime examples, independent of the baseline. Raises if no runtime data has been accumulated yet. returns: PerformanceSnapshot - metric name to value mapping
drift_snapshot
Compute a point-in-time drift snapshot comparing accumulated runtime examples to the baseline using a default threshold. Raises if no runtime data has been accumulated yet. returns: DriftSnapshot - metric name to drift value mapping
drift_report
Evaluate whether accumulated runtime performance has drifted beyond the given threshold relative to the baseline. Raises if no runtime data has been accumulated yet. args: drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics
drift_report_partial_metrics
drift_report_partial_metrics(metrics: list[LinearRegressionDriftMetric], drift_threshold: float) -> DriftReport
Same as drift_report but scoped to a specific subset of metrics. args: metrics: list[LinearRegressionDriftMetric] - metrics to evaluate drift_threshold: float - maximum allowable drift per metric returns: DriftReport - pass/fail result with details of any exceeded metrics
Exceptions
fair_perf_ml.model_perf.core.DifferentModelTypes
Bases: Exception
Exception to handle when user passes in wrong model type.
fair_perf_ml.model_perf.core.InvalidMetricsBody
Bases: Exception
Exception to handle when the user passes an invalid metrics payload.