Module auton_survival.preprocessing
Classes
class Imputer (cat_feat_strat='ignore', num_feat_strat='mean', remaining='drop')
-
A class to impute missing values in the input features.
Real world datasets are often subject to missing covariates. Imputation replaces the missing values allowing downstream experiments. This class allows multiple strategies to impute both categorical and numerical/continuous covariates.
For categorical features, the class allows:
- replace: Replace all null values with a user specificed constant.
- ignore: Keep all missing values as is.
- mode: Replace null values with most commonly occurring variable.
For numerical/continuous features, the user can choose between the following strategies:
- mean: Replace all missing values with the mean in the column.
- median: Replace all missing values with the median in the column.
- knn: Use a k Nearest Neighbour model to predict the missing value.
- missforest: Use the MissForest model to predict the null values.
Parameters
cat_feat_strat
:str
- Strategy for imputing categorical features.
One of
'replace'
,'ignore'
,'mode'
. Default isignore
. num_feat_strat
:str
- Strategy for imputing numerical/continuous features.
One of
'mean'
,'median'
,'knn'
,'missforest'
. Default ismean
. remaining
:str
- Strategy for handling remaining columns.
One of
'ignore'
,'drop'
. Default isdrop
.
Methods
def fit(self, data, cat_feats=None, num_feats=None, fill_value=-1, n_neighbors=5, **kwargs)
def transform(self, data)
def fit_transform(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)
-
Imputes dataset using imputation strategies.
Parameters
data
:pandas.DataFrame
- The dataframe to be imputed.
cat_feats
:list
- List of categorical features.
num_feats
:list
- List of numerical/continuous features.
fill_value
:int
- Value to be filled if
cat_feat_strat='replace'
. n_neighbors
:int
- Number of neighbors to be used if
num_feat_strat='knn'
. **kwargs
- Passed on.
Returns
pandas.DataFrame
- Imputed dataset.
class Scaler (scaling_strategy='standard')
-
Scaler to rescale numerical features.
For scaling, the user can choose between the following strategies:
- standard: Perform the standard scaling method.
- minmax: Perform the minmax scaling method.
- none: Do not perform scaling.
Parameters
scaling_strategy
:str
- Strategy to use for scaling numerical/continuous data.
One of
'standard'
,'minmax'
,'none'
. Default isstandard
.
Methods
def fit(self, data, num_feats=None)
-
Fits scaler to dataset using scaling strategy.
Parameters
data
:pandas.DataFrame
- Dataframe to be scaled.
feats
:list
- List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.
Returns
Fitted instance of scaler.
def transform(self, data)
-
Scales data using scaling strategy.
Parameters
data
:pandas.DataFrame
- Dataframe to be scaled.
feats
:list
- List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.
Returns
Fitted instance of scaler.
def fit_transform(self, data, num_feats=[])
-
Fits a scaler and rescales a dataset using a standard rescaling strategy.
Parameters
data
:pandas.DataFrame
- Dataframe to be scaled.
feats
:list
- List of numerical/continuous features to be scaled. NOTE: if left empty, all features are interpreted as numerical.
Returns
pandas.DataFrame
- Scaled dataset.
class Preprocessor (cat_feat_strat='ignore', num_feat_strat='mean', scaling_strategy='standard', one_hot=True, remaining='drop')
-
A composite transform involving both scaling and preprocessing.
Parameters
cat_feat_strat
:str
- Strategy for imputing categorical features.
num_feat_strat
:str
- Strategy for imputing numerical/continuous features.
scaling_strategy
:str
- Strategy to use for scaling numerical/continuous data.
one_hot
:bool
- Whether to apply one hot encoding to the data.
remaining
:str
- Strategy for handling remaining columns.
Methods
def fit(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)
-
Fit imputer and scaler to dataset.
def transform(self, data)
-
Impute and scale the dataset.
def fit_transform(self, data, cat_feats, num_feats, fill_value=-1, n_neighbors=5, **kwargs)
-
Imputes and scales dataset.
Parameters
data
:pandas.DataFrame
- The dataframe to be imputed.
cat_feats
:list
- List of categorical features.
num_feats
:list
- List of numerical/continuous features.
one_hot
:bool
- Indicating whether to perform one-hot encoding.
fill_value
:int
- Value to be filled if
cat_feat_strat='replace'
. n_neighbors
:int
- Number of neighbors to be used if
num_feat_strat='knn'
. **kwargs
- Passed on.
Returns:
pandas.DataFrame: Imputed and scaled dataset.