Data Drift

Drift detection between continuous and categorical distributions — batch and streaming. Import from fair_perf_ml.drift.

Supported metrics: Jensen–Shannon Divergence, Population Stability Index (PSI), Wasserstein Distance, Kullback–Leibler Divergence.

Enums

fair_perf_ml.drift.base.DataDriftType

Bases: str, Enum

Currently supported methods of deriving the divergence between two distributions.

fair_perf_ml.drift.base.QuantileType

Bases: str, Enum

Supported method for deriving the number of bins to use when approximating a continuous distribution.

Batch functions

fair_perf_ml.drift.base.compute_drift_continuous_distribution

compute_drift_continuous_distribution(baseline_distribution: FloatingPointDataSlice, candidate_distribution: FloatingPointDataSlice, drift_metrics: list[DataDriftMetric], quantile_type: QuantileConfig | None = None) -> list[float]

Ad hoc computation of drift between two distributions of continuous data.

Parameters:

Name	Type	Description	Default
`baseline_distribution`	`FloatingPointDataSlice`	list[StringBound]	required
`candidate_distribution`	`FloatingPointDataSlice`	list[StringBound]	required
`drift_metrics`	`list[DataDriftMetric]`	list[DataDriftMetric]	required
`quantile_type`	`QuantileConfig \| None`	QuantileConfig = None - defaults to FreedmanDiaconis	`None`

returns: list[float] - one entry for every drift method provided, element wise mapped.

fair_perf_ml.drift.base.compute_drift_categorical_distribution

compute_drift_categorical_distribution(baseline_distribution: list[StringBound], candidate_distribution: list[StringBound], drift_metrics: list[DataDriftMetric]) -> list[float]

Ad hoc computation of drift between two distributions of cateogrical data.

Parameters:

Name	Type	Description	Default
`baseline_distribution`	`list[StringBound]`	list[StringBound]	required
`candidate_distribution`	`list[StringBound]`	list[StringBound]	required
`drift_metrics`	`list[DataDriftMetric]`	list[DataDriftMetric]	required

returns: list[float] - one entry for every drift method provided, element wise mapped.

Batch classes

fair_perf_ml.drift.base.ContinuousDataDrift

Bases: DataDriftDiscreteBase[float, list[float]]

Detects distributional drift in continuous (floating-point) features between a fixed baseline dataset and a runtime dataset.

Internally, the baseline is summarized as a histogram. The number of bins is derived automatically from the baseline data using the selected quantile rule. Drift is then measured by comparing the runtime data's distribution against the baseline histogram using the chosen divergence metric.

This type is suited for batch analysis: you collect a runtime dataset and compare it against the baseline in one call. For long-running accumulation where data arrives incrementally, use the streaming variants instead.

Considerations

Bin count is determined by the baseline data and the quantile rule. If the data does not support the target bin count, fewer bins will be used.
Resetting the baseline recomputes the histogram from scratch using the same quantile rule.

num_bins `property`

num_bins: int

The number of histogram bins derived from the baseline dataset.

init

__init__(baseline_data: FloatingPointDataSlice, quantile_type: str | None = None) -> None

Initialize with a baseline dataset.

Parameters:

Name	Type	Description	Default
`baseline_data`	`FloatingPointDataSlice`	The reference distribution. Accepts a numpy array or any iterable of values castable to float.	required
`quantile_type`	`str \| None`	Controls how many histogram bins are derived from the baseline. Options: `"FreedmanDiaconis"` (default, IQR-based, robust to outliers), `"Scott"` (std-based, assumes roughly normal data), `"Sturges"` (log2-based, best for small datasets). Pass `None` to use the default. Also accepts a `QuantileType` enum value.	`None`

reset_baseline

reset_baseline(new_baseline: FloatingPointDataSlice) -> None

Replace the baseline with a new dataset, recomputing the histogram.

Parameters:

Name	Type	Description	Default
`new_baseline`	`FloatingPointDataSlice`	The new reference distribution. Accepts a numpy array or any iterable of values castable to float.	required

compute_drift

compute_drift(runtime_data: FloatingPointDataSlice, drift_metric: DataDriftMetric) -> float

Compute a single drift score between runtime_data and the baseline.

Parameters:

Name	Type	Description	Default
`runtime_data`	`FloatingPointDataSlice`	The data collected at runtime. Accepts a numpy array or any iterable of values castable to float.	required
`drift_metric`	`DataDriftMetric`	The divergence measure to use. Accepts a `DataDriftType` enum value or one of the strings `"JensenShannon"`, `"PopulationStabilityIndex"`, `"WassersteinDistance"`, `"KullbackLeibler"`.	required

Returns:

Type	Description
`float`	The drift score as a float. Higher values indicate greater divergence
`float`	from the baseline distribution.

compute_drift_multiple_criteria

compute_drift_multiple_criteria(runtime_data: FloatingPointDataSlice, drift_metrics: list[DataDriftMetric]) -> list[float]

Compute multiple drift scores against runtime_data in a single pass.

Parameters:

Name	Type	Description	Default
`runtime_data`	`FloatingPointDataSlice`	The data collected at runtime. Accepts a numpy array or any iterable of values castable to float.	required
`drift_metrics`	`list[DataDriftMetric]`	A list of divergence measures to compute. Each entry accepts a `DataDriftType` enum value or a metric name string.	required

Returns:

Type	Description
`list[float]`	A list of drift scores in the same order as `drift_metrics`.

export_baseline

export_baseline() -> list[float]

Export the baseline as a normalized probability distribution.

Returns:

Type	Description
`list[float]`	A list of floats, one per bin, where each value is the fraction of
`list[float]`	baseline samples that fall into that bin. Values sum to 1.0.

fair_perf_ml.drift.base.CategoricalDataDrift

Bases: DataDriftDiscreteBase[str, dict[str, float]]

num_bins `property`

num_bins: int

The number of histogram bins derived from the baseline dataset.

init

__init__(baseline_data: Sequence[StringBound]) -> None

Initialize with a baseline dataset.

Parameters:

Name	Type	Description	Default
`baseline_data`	`Sequence[StringBound]`	The reference distribution. Any iterable whose elements implement `__str__`.	required

reset_baseline

reset_baseline(new_baseline: Sequence[StringBound]) -> None

Replace the baseline with a new dataset, recomputing the label distribution.

Parameters:

Name	Type	Description	Default
`new_baseline`	`Sequence[StringBound]`	The new reference distribution. Any iterable whose elements implement `__str__`.	required

compute_drift

compute_drift(runtime_data: Sequence[StringBound], drift_metric: DataDriftMetric) -> float

Compute a single drift score between runtime_data and the baseline.

Parameters:

Name	Type	Description	Default
`runtime_data`	`Sequence[StringBound]`	The data collected at runtime. Any iterable whose elements implement `__str__`.	required
`drift_metric`	`DataDriftMetric`	The divergence measure to use. Accepts a `DataDriftType` enum value or one of the strings `"JensenShannon"`, `"PopulationStabilityIndex"`, `"WassersteinDistance"`, `"KullbackLeibler"`.	required

Returns:

Type	Description
`float`	The drift score as a float. Higher values indicate greater divergence
`float`	from the baseline distribution.

compute_drift_multiple_criteria

compute_drift_multiple_criteria(runtime_data: Sequence[StringBound], drift_metrics: list[DataDriftMetric]) -> list[float]

Compute multiple drift scores against runtime_data in a single pass.

Parameters:

Name	Type	Description	Default
`runtime_data`	`Sequence[StringBound]`	The data collected at runtime. Any iterable whose elements implement `__str__`.	required
`drift_metrics`	`list[DataDriftMetric]`	A list of divergence measures to compute. Each entry accepts a `DataDriftType` enum value or a metric name string.	required

Returns:

Type	Description
`list[float]`	A list of drift scores in the same order as `drift_metrics`.

export_baseline

export_baseline() -> dict[str, float]

Export the baseline as a normalized label frequency distribution.

Returns:

Type	Description
`dict[str, float]`	A dict mapping each label (including the overflow bin) to its
`dict[str, float]`	fraction of the baseline dataset. Values sum to 1.0.

Streaming — continuous

fair_perf_ml.drift.streaming.StreamingContinuousDataDriftFlush

Bases: DataDriftStreamingBase[float, dict[str, list[float]], list[float]]

fair_perf_ml.drift.streaming.StreamingContinuousDataDriftDecay

Bases: DataDriftStreamingBase[float, dict[str, list[float]], list[float]]

Streaming — categorical

fair_perf_ml.drift.streaming.StreamingCategoricalDataDriftFlush

Bases: DataDriftStreamingBase[StringBound, dict[str, float], dict[str, float]]

fair_perf_ml.drift.streaming.StreamingCategoricalDataDriftDecay

Bases: DataDriftStreamingBase[StringBound, dict[str, float], dict[str, float]]

Exceptions

fair_perf_ml.drift.base.DataDriftParameterValidationError

Bases: Exception

Exception for when users pass invalid data in

Data Drift

Enums

fair_perf_ml.drift.base.DataDriftType

fair_perf_ml.drift.base.QuantileType

Batch functions

fair_perf_ml.drift.base.compute_drift_continuous_distribution

fair_perf_ml.drift.base.compute_drift_categorical_distribution

Batch classes

fair_perf_ml.drift.base.ContinuousDataDrift

num_bins property

__init__

reset_baseline

compute_drift

compute_drift_multiple_criteria

export_baseline

fair_perf_ml.drift.base.CategoricalDataDrift

num_bins property

__init__

reset_baseline

compute_drift

compute_drift_multiple_criteria

export_baseline

Streaming — continuous

fair_perf_ml.drift.streaming.StreamingContinuousDataDriftFlush

fair_perf_ml.drift.streaming.StreamingContinuousDataDriftDecay

Streaming — categorical

fair_perf_ml.drift.streaming.StreamingCategoricalDataDriftFlush

fair_perf_ml.drift.streaming.StreamingCategoricalDataDriftDecay

Exceptions

fair_perf_ml.drift.base.DataDriftParameterValidationError

num_bins `property`

init

num_bins `property`

init