Skip to content

Configuration

Config dataclass

Configuration for the high-level interface aindo.anonymize.AnonymizationPipeline.

from_dict classmethod

from_dict(value: dict[str, Any]) -> Config

Creates an instance of the class from a dictionary.

Parameters:

Name Type Description Default
value dict[str, Any]

A dictionary where keys represent the attributes of the class and values are their corresponding values.

required

Returns:

Type Description
Config

An instance of the class populated with the data from the dictionary.

to_dict

to_dict() -> dict[str, Any]

Converts the instance of the class into a dictionary.

Returns:

Name Type Description
dict dict[str, Any]

A dictionary where keys are attribute names and values are the corresponding attribute values of the object.

Configuration Schema

The Config.from_dict method accepts a Python dictionary following the schema below.

Key Type Description
steps list[dict] A list of anonymization steps.
steps[i].method dict Defines the anonymization technique and its parameters.
steps[i].method.type str The name of the anonymization technique in snake_case.
steps[i].method.<param> dict Additional key-value pairs for technique-specific parameters. See the list of anonymization techniques and their respective parameters.
steps[i].columns list[string] | None The list of column names to which the anonymization method applies. If set to None, the technique is applied to all columns. An empty list is not allowed.

For a full configuration example, see the code below. Note that some parameters may be mutually exclusive and are therefore not included in this example.
For a complete reference of technique-specific parameters, see the API reference - Techniques.

Full configuration example
config.json
{
  "steps": [
    {
      "method": {
        "type": "binning",
        "bins": 10
      },
      "columns": ["column_bin"]
    },
    {
      "method": {
        "type": "character_masking",
        "mask_length": 3,
        "symbol": "*",
        "starting_direction": "left"
      },
      "columns": ["column_mask"]
    },
    {
      "method": {
        "type": "data_nulling",
        "constant_value": "BLANK"
      },
      "columns": ["column_null"]
    },
    {
      "method": {
        "type": "key_hashing",
        "key": "my key",
        "salt": "my salt",
        "hash_name": "sha256"
      },
      "columns": ["column_key_hash"]
    },
    {
      "method": {
        "type": "identity"
      },
      "columns": ["column_id_1", "column_id_2"]
    },
    {
      "method": {
        "type": "mocking",
        "data_generator": "name"
      },
      "columns": ["column_mock"]
    },
    {
      "method": {
        "type": "perturbation_categorical",
        "alpha": 0.8,
        "sampling_mode": "uniform",
        "frequencies": [
          {"A": 0.5},
          {"B": 0.5}
        ],
        "seed": 42
      },
      "columns": ["column_pert_cat"]
    },
    {
      "method": {
        "type": "perturbation_numerical",
        "alpha": 0.8,
        "sampling_mode": "weighted",
        "perturbation_range": [1, 10],
        "seed": 42
      },
      "columns": ["column_per_num"]
    },
    {
      "method": {
        "type": "swapping",
        "alpha": 0.8,
        "seed": 42
      },
      "columns": ["column_swap"]
    },
    {
      "method": {
        "type": "top_bottom_coding_categorical",
        "q": 0.8,
        "other_label": "OTHER"
      },
      "columns": ["column_tbc_cat"]
    },
    {
      "method": {
        "type": "top_bottom_coding_numerical",
        "q": 0.3
      },
      "columns": ["column_tbc_num"]
    }
  ]
}