Module `auton_survival.preprocessing`

Classes

class Imputer (cat_feat_strat='ignore', num_feat_strat='mean', remaining='drop')

A class to impute missing values in the input features.

Real world datasets are often subject to missing covariates. Imputation replaces the missing values allowing downstream experiments. This class allows multiple strategies to impute both categorical and numerical/continuous covariates.

For categorical features, the class allows:

replace: Replace all null values with a user specificed constant.
ignore: Keep all missing values as is.
mode: Replace null values with most commonly occurring variable.

For numerical/continuous features, the user can choose between the following strategies:

mean: Replace all missing values with the mean in the column.
median: Replace all missing values with the median in the column.
knn: Use a k Nearest Neighbour model to predict the missing value.
missforest: Use the MissForest model to predict the null values.

Parameters

cat_feat_strat : str: Strategy for imputing categorical features. One of 'replace', 'ignore', 'mode'. Default is ignore.
num_feat_strat : str: Strategy for imputing numerical/continuous features. One of 'mean', 'median', 'knn', 'missforest'. Default is mean.
remaining : str: Strategy for handling remaining columns. One of 'ignore', 'drop'. Default is drop.

Browse git

Methods

def fit(self, data, cat_feats=None, num_feats=None, fill_value=-1, n_neighbors=5, **kwargs)

Browse git

def transform(self, data)

Browse git

def fit_transform(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)

Imputes dataset using imputation strategies.

Parameters

data : pandas.DataFrame: The dataframe to be imputed.
cat_feats : list: List of categorical features.
num_feats : list: List of numerical/continuous features.
fill_value : int: Value to be filled if cat_feat_strat='replace'.
n_neighbors : int: Number of neighbors to be used if num_feat_strat='knn'.
**kwargs: Passed on.

Returns

pandas.DataFrame: Imputed dataset.

Browse git

class Scaler (scaling_strategy='standard')

Scaler to rescale numerical features.

For scaling, the user can choose between the following strategies:

standard: Perform the standard scaling method.
minmax: Perform the minmax scaling method.
none: Do not perform scaling.

Parameters

scaling_strategy : str: Strategy to use for scaling numerical/continuous data. One of 'standard', 'minmax', 'none'. Default is standard.

Browse git

Methods

def fit(self, data, num_feats=None)

Fits scaler to dataset using scaling strategy.

Parameters

data : pandas.DataFrame: Dataframe to be scaled.
feats : list: List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.

Returns

Fitted instance of scaler.

Browse git

def transform(self, data)

Scales data using scaling strategy.

Parameters

data : pandas.DataFrame: Dataframe to be scaled.
feats : list: List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.

Returns

Fitted instance of scaler.

Browse git

def fit_transform(self, data, num_feats=[])

Fits a scaler and rescales a dataset using a standard rescaling strategy.

Parameters

data : pandas.DataFrame: Dataframe to be scaled.
feats : list: List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.

Returns

pandas.DataFrame: Scaled dataset.

Browse git

class Preprocessor (cat_feat_strat='ignore', num_feat_strat='mean', scaling_strategy='standard', one_hot=True, remaining='drop')

A composite transform involving both scaling and preprocessing.

Parameters

cat_feat_strat : str: Strategy for imputing categorical features.
num_feat_strat : str: Strategy for imputing numerical/continuous features.
scaling_strategy : str: Strategy to use for scaling numerical/continuous data.
one_hot : bool: Whether to apply one hot encoding to the data.
remaining : str: Strategy for handling remaining columns.

Browse git

Methods

def fit(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)

Fit imputer and scaler to dataset.

Browse git

def transform(self, data)

Impute and scale the dataset.

Browse git

def fit_transform(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)

Imputes and scales dataset.

Parameters

data : pandas.DataFrame: The dataframe to be imputed.
cat_feats : list: List of categorical features.
num_feats : list: List of numerical/continuous features.
one_hot : bool: Indicating whether to perform one-hot encoding.
fill_value : int: Value to be filled if cat_feat_strat='replace'.
n_neighbors : int: Number of neighbors to be used if num_feat_strat='knn'.
**kwargs: Passed on.

Returns:

pandas.DataFrame: Imputed and scaled dataset.

Browse git