Data Bias
Pre-training bias analysis. Import from fair_perf_ml.data_bias.
Supported metrics: Class Imbalance (CI), Difference in Proportion of Labels (DPL), KL Divergence, JS Divergence, Lp-Norm, Total Variation Distance (TVD), Kolmogorov–Smirnov (KS).
Batch analysis
fair_perf_ml.data_bias.core.data_bias_perform_analysis
data_bias_perform_analysis(feature: list[F] | NDArray, ground_truth: list[G] | NDArray, feature_label_or_threshold: F, ground_truth_label_or_threshold: G) -> AnalysisReport
interface into rust class makes sure we are passing numpy arrays to the rust function Args: feature: list[str | float | int] | NDArray -> the feature data most efficient to pass as numpy array ground_truth: list[str | float | int] | NDArray -> the ground truth data most efficient to pass as numpy array feature_label_or_threshold: str | float | int -> segmentation parameter for the feature ground_truth_label_or_threshold: str | float | int -> segmenation parameter for ground truth
fair_perf_ml.data_bias.core.data_bias_perform_analysis_explicit_segmentation
data_bias_perform_analysis_explicit_segmentation(feature: BiasDataPayload[F], ground_truth: BiasDataPayload[G]) -> AnalysisReport
Method to provide explicit segmentation criteria for ad hoc data bias analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature
|
BiasDataPayload[F]
|
DataBiasPayload[F] |
required |
ground_truth
|
BiasDataPayload[G]
|
DataBiasPayload[G] |
required |
returns: AnalysisReport
Runtime comparison
fair_perf_ml.data_bias.core.data_bias_runtime_comparison
data_bias_runtime_comparison(baseline: AnalysisReport, latest: AnalysisReport, threshold: float = 0.1) -> DriftReport
Compare the current runtime analysis result to the baseline to determine the model drift from the baseline. Metrics the exceed the provided drift will be present in the drift report. Args: baseline: dict -> the result from calling perform_analysis on the baseline data latest: dict -> the current data for comparison from calling perform_analysis threshold: Optionl[float]=None -> the comparison threshold, defaults to 0.10 in rust mod Returns: dict - the drift report, detailing the metrics that have drifted and to what degree.
fair_perf_ml.data_bias.core.data_bias_partial_runtime_comparison
data_bias_partial_runtime_comparison(baseline: AnalysisReport, latest: AnalysisReport, metrics: list[DataBiasDriftMetric], threshold: float = 0.1) -> DriftReport
Performs the same drift comparison as the above method, but allows the user to narrow the drift evaluation, explicitly specifying the metrics to evaluate for drift, rather than the full available suite. Args: baseline: dict -> the result from calling perform_analysis on the baseline data latest: dict -> the current data for comparison from calling perform_analysis metrics: List[str] -> the list of metrics we want to evaluate on threshold: Optionl[float]=None -> the comparison threshold, defaults to 0.10 in rust mod Returns: dict
Streaming
fair_perf_ml.data_bias.streaming.DataBiasStreaming
Container to hold state for long running data bias monitoring sessions, and compute data bias observability metrics on demand. Allows the use of arbitrary types that implement ge and eq to define ordering for segmentation.
Wraps the internal core rust logic.
push
Push a single feature and ground truth example into the stream. Types should be consistent with what is defined in the segmentation criteria. args: feature_value: F ground_truth_value: G returns: None
push_batch
Push a single feature and ground truth example into the stream. Types should be consistent with what is defined in the segmentation criteria, and the length of the 2 arrays should be the same. If either invariant is broken, then an exception will be thrown. args: feature_value: Iterable[F] ground_truth_value: Iterable[G] returns: None
reset_baseline
Reset the baseline state. The same segmentation criteria that was defined on object construction will be used. args: feature_value: Iterable[F] ground_truth_value: Iterable[G] returns: None
reset_baseline_and_segmentation_criteria
reset_baseline_and_segmentation_criteria(updated_feature_segmentation: BiasSegmentationProtocol[F], updated_ground_truth_segmentation: BiasSegmentationProtocol[G], feature_data: Sequence[F], ground_truth_data: Sequence[G]) -> None
Reset the baseline state and update the segmentation criteria. This may be useful when there is a significant shift in the distribution of the data. args: feature_value: Iterable[F] ground_truth_value: Iterable[G] returns: None
performance_snapshot
Generate a performance snapshot, irrespective of the baseline data state. An exception will be thrown if no runtime data has been pushed into the stream.