Module auton_survival.phenotyping
Utilities to phenotype individuals based on similar survival characteristics.
Functions
def random()

random() > x in the interval [0, 1).
Classes
class Phenotyper (random_seed=0)

Base class for all phenotyping methods.
class IntersectionalPhenotyper (cat_vars=None, num_vars=None, num_vars_quantiles=(0, 0.5, 1.0), random_seed=0)

A phenotyper that phenotypes by performing an exhaustive cartesian product on prespecified set of categorical and numerical variables.
Parameters
cat_vars
:list
ofpython str(s)
, default=None
 List of column names of categorical variables to phenotype on.
num_vars
:list
ofpython str(s)
, default=None
 List of column names of continuous variables to phenotype on.
num_vars_quantiles
:tuple
offloats
, default=(0, .5, 1.0)
 A tuple of quantiles as floats (inclusive of 0 and 1) used to discretize continuous variables into equalsized bins.
features
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns as covariates.
phenotypes
:list
 List of lists containing all possible combinations of specified categorical and numerical variable values.
Methods
def fit(self, features)

Fit the phenotyper by finding all possible intersectional groups on a passed set of features.
Parameters
features
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
Trained instance of intersectional phenotyper.
def predict(self, features)

Phenotype out of sample test data.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
 a numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
def fit_predict(self, features)

Fit and perform phenotyping on a given dataset.
Parameters
features
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
 A numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
class ClusteringPhenotyper (clustering_method='kmeans', dim_red_method=None, random_seed=0, **kwargs)

Phenotyper that performs dimensionality reduction followed by clustering. Learned clusters are considered phenotypes and used to group samples based on similarity in the covariate space.
Parameters
features
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns as covariates.
clustering_method
:str
, default='kmeans'

The clustering method applied for phenotyping. Options include:
kmeans
: KMeans Clusteringdbscan
: DensityBased Spatial Clustering of Applications with Noise (DBSCAN)gmm
: Gaussian Mixturehierarchical
: Agglomerative Clustering
dim_red_method
:str
, default=None

The dimensionality reductions method applied. Options include:
pca
: Principal Component Analysiskpca
: Kernel Principal Component Analysisnnmf
: NonNegative Matrix Factorization None : dimensionality reduction is not applied.
random_seed
:int
, default=0
 Controls the randomness and reproducibility of called functions
kwargs
:dict

Additional arguments for dimensionality reduction and clustering Please include dictionary key and item pairs specified by the following scikitlearn modules:
pca
: sklearn.decomposition.PCAnnmf
: sklearn.decomposition.NMFkpca
: sklearn.decomposition.KernelPCAkmeans
: sklearn.cluster.KMeansdbscan
: sklearn.cluster.DBSCANgmm
: sklearn.mixture.GaussianMixturehierarchical
: sklearn.cluster.AgglomerativeClustering
Methods
def fit(self, features)

Perform dimensionality reduction and train an instance of the clustering algorithm.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
Trained instance of clustering phenotyper.
def predict_proba(self, features)

Peform dimensionality reduction, clustering, and estimate probability estimates of sample association to learned clusters, or subgroups.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
 a numpy array of the probability estimates of sample association to learned subgroups.
def predict(self, features)

Peform dimensionality reduction, clustering, and extract phenogroups that maximize the probability estimates of sample association to specific learned clusters, or subgroups.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
 a numpy array of phenogroup labels
def fit_predict(self, features)

Fit and perform phenotyping on a given dataset.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array
 a numpy array of the probability estimates of sample association to learned clusters.
class SurvivalVirtualTwinsPhenotyper (cf_method='dcph', phenotyping_method='rfr', cf_hyperparams=None, random_seed=0, **phenotyper_hyperparams)

Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor.
Methods
def fit(self, features, outcomes, interventions, horizons, metric)

Fit a counterfactual model and regress the difference of the estimated counterfactual Restricted Mean Survival Time using a Random Forest regressor.
Parameters
features
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
interventions
:np.array
 Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizons
:int
orfloat
orlist
 Eventhorizons at which to evaluate model performance.
metric
:str
, default='ibs'
 Metric used to evaluate model performance and tune hyperparameters. Options include:  'auc': Dynamic area under the ROC curve  'brs' : Brier Score  'ibs' : Integrated Brier Score  'ctd' : Concordance Index
Returns
Trained instance of Survival Virtual Twins Phenotyer.
def predict_proba(self, features)

Estimate the probability that the Restrictred Mean Survival Time under the Treatment group is greater than that under the control group.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array
 a numpy array of the phenogroup probabilties in the format [control_group, treated_group].
def predict(self, features)

Extract phenogroups that maximize probability estimates.
Parameters
features
:pd.DataFrame
 a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array
 a numpy array of the phenogroup labels
def fit_predict(self, features, outcomes, interventions, horizon)

Fit and perform phenotyping on a given dataset.
Parameters
features
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes
:pd.DataFrame
 A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
treatment_indicator
:np.array
 Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizon
:np.float
 The event horizon at which to compute the counterfacutal RMST for regression.
Returns
np.array
 a numpy array of the phenogroup labels.