Module auton_survival.preprocessing


class Imputer (cat_feat_strat='ignore', num_feat_strat='mean', remaining='drop')

A class to impute missing values in the input features.

Real world datasets are often subject to missing covariates. Imputation replaces the missing values allowing downstream experiments. This class allows multiple strategies to impute both categorical and numerical/continuous covariates.

For categorical features, the class allows:

  • replace: Replace all null values with a user specificed constant.
  • ignore: Keep all missing values as is.
  • mode: Replace null values with most commonly occurring variable.

For numerical/continuous features, the user can choose between the following strategies:

  • mean: Replace all missing values with the mean in the column.
  • median: Replace all missing values with the median in the column.
  • knn: Use a k Nearest Neighbour model to predict the missing value.
  • missforest: Use the MissForest model to predict the null values.


cat_feat_strat : str
Strategy for imputing categorical features. One of 'replace', 'ignore', 'mode'. Default is ignore.
num_feat_strat : str
Strategy for imputing numerical/continuous features. One of 'mean', 'median', 'knn', 'missforest'. Default is mean.
remaining : str
Strategy for handling remaining columns. One of 'ignore', 'drop'. Default is drop.


def fit(self, data, cat_feats=None, num_feats=None, fill_value=-1, n_neighbors=5, **kwargs)
def transform(self, data)
def fit_transform(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)

Imputes dataset using imputation strategies.


data : pandas.DataFrame
The dataframe to be imputed.
cat_feats : list
List of categorical features.
num_feats : list
List of numerical/continuous features.
fill_value : int
Value to be filled if cat_feat_strat='replace'.
n_neighbors : int
Number of neighbors to be used if num_feat_strat='knn'.
Passed on.


Imputed dataset.
class Scaler (scaling_strategy='standard')

Scaler to rescale numerical features.

For scaling, the user can choose between the following strategies:

  • standard: Perform the standard scaling method.
  • minmax: Perform the minmax scaling method.
  • none: Do not perform scaling.


scaling_strategy : str
Strategy to use for scaling numerical/continuous data. One of 'standard', 'minmax', 'none'. Default is standard.


def fit(self, data, num_feats=None)

Fits scaler to dataset using scaling strategy.


data : pandas.DataFrame
Dataframe to be scaled.
feats : list
List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.


Fitted instance of scaler.

def transform(self, data)

Scales data using scaling strategy.


data : pandas.DataFrame
Dataframe to be scaled.
feats : list
List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.


Fitted instance of scaler.

def fit_transform(self, data, num_feats=[])

Fits a scaler and rescales a dataset using a standard rescaling strategy.


data : pandas.DataFrame
Dataframe to be scaled.
feats : list
List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.


Scaled dataset.
class Preprocessor (cat_feat_strat='ignore', num_feat_strat='mean', scaling_strategy='standard', one_hot=True, remaining='drop')

A composite transform involving both scaling and preprocessing.


cat_feat_strat : str
Strategy for imputing categorical features.
num_feat_strat : str
Strategy for imputing numerical/continuous features.
scaling_strategy : str
Strategy to use for scaling numerical/continuous data.
one_hot : bool
Whether to apply one hot encoding to the data.
remaining : str
Strategy for handling remaining columns.


def fit(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)

Fit imputer and scaler to dataset.

def transform(self, data)

Impute and scale the dataset.

def fit_transform(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)

Imputes and scales dataset.


data : pandas.DataFrame
The dataframe to be imputed.
cat_feats : list
List of categorical features.
num_feats : list
List of numerical/continuous features.
one_hot : bool
Indicating whether to perform one-hot encoding.
fill_value : int
Value to be filled if cat_feat_strat='replace'.
n_neighbors : int
Number of neighbors to be used if num_feat_strat='knn'.
Passed on.


pandas.DataFrame: Imputed and scaled dataset.