aindo-anonymize techniques are classes that derive from BaseTechnique and define specific parameters and logic for anonymization.


Bases: ABC

Abstract base class for all anonymization techniques.

anonymize abstractmethod

anonymize(dataframe: DataFrame) -> DataFrame

Applies the anonymization technique to the given data.


Name Type Description Default
dataframe DataFrame

The input data to be anonymized.



Type Description

The anonymized version of the input data.

Single-column techniques are anonymization methods designed to operate on individual data columns. These techniques are implemented as classes that derive from BaseSingleColumnTechnique.


Bases: BaseTechnique, ABC

Abstract base class for anonymization techniques applied to a single column.

Subclasses should implement the anonymize_column method, which defines the logic for anonymizing a single column.


anonymize_column(col: Series) -> Series

Applies the anonymization technique to a single column.


Name Type Description Default
col Series

The input data to be anonymized.



Type Description

The anonymized version of the input data.


anonymize(dataframe: DataFrame) -> DataFrame

Applies the anonymization technique to a single-column dataframe.

This is analogous to calling anonymize_column() on a single Pandas Series. It is a convenience method shared across all types of anonymizers.


Name Type Description Default
dataframe DataFrame

The input data. Must have exactly one column.



Type Description

The anonymized version of the input data.



DataNulling(constant_value: Any = None)

Bases: BaseSingleColumnTechnique

Implements data nulling.

Data nulling replaces the original data with a None value (or a custom constant value).


Name Type Description
constant_value Any

The value that will replace the original data. Default to None.


    mask_length: int = 1,
    symbol: AnyStr = "*",
    starting_direction: StartingDirection = "left",

Bases: BaseSingleColumnTechnique, Generic[AnyStr]

Implements character masking.

Character masking involves replacing, usually partially, the characters of a data value with a constant symbol. Full masking is achieved by setting mask_length=-1.


Name Type Description
starting_direction StartingDirection

The direction in which masking starts. Default is "left".

mask_length int

The number of characters to mask. Set to -1 to mask the entire value. Defaults to 1.

symbol AnyStr

The symbol used for masking. Defaults to "*".


    data_generator: MockingGeneratorMethods,
    seed: SeedType = None,
    faker_kwargs: dict[str, Any] | None = None,
    faker_generator_kwargs: dict[str, Any] | None = None,

Bases: BaseSingleColumnTechnique

Implements mocking.

Mocking generates realistic mock data for various fields such as names, addresses, emails, and more. It leverages the faker library to produce customizable, locale-aware fake data.


Name Type Description
data_generator MockingGeneratorMethods

Faker's generator method ("fake") used to generate data (e.g., name, email).

seed SeedType

A seed to initialize numpy Generator.


Name Type Description Default
data_generator MockingGeneratorMethods

Faker's generator method ("fake") used to generate data (e.g., name, email).

faker_kwargs dict[str, Any] | None

Additional arguments passed to the main Faker object (proxy class).

faker_generator_kwargs dict[str, Any] | None

Additional arguments passed to the Faker's generator method.



    key: str,
    salt: str | None = None,
    hash_name: str = "sha256",

Bases: BaseSingleColumnTechnique

Implements key-based hashing.

Data values are hashed using HMAC with a cryptographic key and the chosen hashing algorithm (defaults to SHA-256). The resulting hash is then encoded using Base64. The de-identified values have always a uniform length.


Name Type Description
key str

The cryptographic key used for hashing.

salt str | None

An optional salt that can be added to the value before hashing. Defaults to None.

hash_name str

The hashing algorithm to use, compatible with Defaults to "sha256".


Swapping(alpha: float, **kwargs: SeedT)

Bases: BaseSingleColumnTechnique, Seeder, AlphaProbability

Implements swapping.

Swapping rearranges data by shuffling values, ensuring that individual values remain present but are generally not in their original position. The process is controlled by the alpha parameter, representing the probability of a row being swapped with another.


Name Type Description
alpha float

The perturbation intensity, a value in the range [0, 1].


Binning(bins: int | Sequence[int] | Sequence[float])

Bases: BaseSingleColumnTechnique

Implements binning for numerical columns.

Binning works by grouping numerical values into discrete bins, allowing for data generalization by replacing individual values with their corresponding bin ranges.


Name Type Description
bins int | Sequence[int] | Sequence[float]

The bin edges or number of bins to use.


An integer bins will form equal-width bins.

>>> ages = pd.Series([10, 15, 13, 12, 23, 25, 28, 59, 60])
>>> Binning(bins=3).anonymize_column(ages)
[(9.95, 26.667], (9.95, 26.667], (9.95, 26.667], ...
Categories (3, interval[float64, right]): [(9.95, 26.667] < (26.667, 43.333] < (43.333, 60.0]]

A list of ordered bin edges will assign an interval for each variable.

>>> ages = pd.Series([10, 15, 13, 12, 23, 25, 28, 59, 60])
>>> Binning(bins=[0, 18, 35, 70]).anonymize_column(ages)
[(0, 18], (0, 18], (0, 18], (0, 18], (18, 35], ...
Categories (3, interval[int64, right]): [(0, 18] < (18, 35] < (35, 70]]


    alpha: float,
    sampling_mode: SamplingMode = "uniform",
    perturbation_range: tuple[NumericsT, NumericsT]
    | None = None,
    **kwargs: SeedT,

Bases: BasePerturbation, Generic[NumericsT]

Implements perturbation for numerical columns.

Perturbation consists of modify each value based on the specified perturbation intensity (alpha) and replacement strategy. It supports two modes of replacement: uniform sampling and distribution-preserving sampling.


Name Type Description
alpha float

The perturbation intensity, a value in the range [0, 1]. - alpha=0: No perturbation; values remain unchanged. - alpha=1: Maximum perturbation; values are fully replaced according to the specified sampling mode.

sampling_mode SamplingMode

The strategy used to sample replacement values: - uniform: Values are perturbed with random values uniformly sampled from the range [min, max]. - weighted: Values are perturbed in a way to keep the original distribution.

perturbation_range tuple[NumericsT, NumericsT] | None

A tuple[min, max] within which random values are sampled. If not set, the range is automatically computed as the minimum and maximum of the input data.


    alpha: float,
    sampling_mode: SamplingMode = "uniform",
    frequencies: dict[str, float] | None = None,
    **kwargs: SeedT,

Bases: BasePerturbation

Implements perturbation for categorical columns.

Perturbation consists of replacing values with randomized alternatives based on the specified sampling mode and perturbation intensity (alpha). It supports two modes of replacement: uniform sampling and distribution-preserving sampling.


Name Type Description
alpha float

The perturbation intensity, a value in the range [0, 1]. - alpha=0: No perturbation; values remain unchanged. - alpha=1: Maximum perturbation; values are fully replaced according to the specified sampling mode.

sampling_mode SamplingMode

The strategy used to sample replacement values: - uniform: Replaces values with others chosen uniformly at random. - weighted: Replaces values based on their original distribution.

frequencies dict[str, float] | None

Optional mapping of unique values to their relative frequencies, used for weighted sampling mode. Automatically computed if not provided.


    q: float | None = None,
    lower_value: float | None = None,
    upper_value: float | None = None,

Bases: BaseSingleColumnTechnique

Implements top/bottom coding for numerical columns.

This technique caps values above the (1 - q/2) quantile (top coding) and raises values below the (q/2) quantile (bottom coding). The threshold parameter q specifies the total proportion of extreme values to code (e.g., q=0.1 applies top/bottom coding to 5% each).

Either the threshold q or both quantile values (lower_value and upper_value) must be provided, but not both. If lower_value and upper_value are used, they must be specified together.


Name Type Description
q float | None

Proportion controlling the extent of top/bottom coding, between 0 and 1.

lower_value float | None

Input data quantile value at q/2.

upper_value float | None

Input data quantile value at (1- q/2).


    q: float | None = None,
    other_label: Any = "OTHER",
    rare_categories: list[Any] | None = None,

Bases: BaseSingleColumnTechnique

Implements top/bottom coding for categorical columns.

Categories representing less or equal than q of the total data are replaced with the other_label (e.g.: q=0.01 represents the 1%).


Name Type Description
q float | None

A proportion controlling the extent of top/bottom coding, between 0 and 1.

other_label Any

The new category to replace rare categories with. Default is "OTHER".

rare_categories list[Any] | None

A list of rare categories to be replaced. This can be used instead of the q parameter to explicitly specify which categories should be replaced with other_label.


StartingDirection module-attribute

StartingDirection = Literal['left', 'right']

SeedT module-attribute

SeedT = int | Generator | None

SamplingMode module-attribute

SamplingMode = Literal['uniform', 'weighted']

NumericsT module-attribute

NumericsT = TypeVar('NumericsT', int, float)

MockingGeneratorMethods module-attribute

MockingGeneratorMethods = Literal[