Module `auton_survival.phenotyping`

Utilities to phenotype individuals based on similar survival characteristics.

Browse git

Functions

def random(): random() -> x in the interval [0, 1).

Classes

class Phenotyper (random_seed=0)

Base class for all phenotyping methods.

Browse git

class IntersectionalPhenotyper (cat_vars=None, num_vars=None, num_vars_quantiles=(0, 0.5, 1.0), random_seed=0)

A phenotyper that phenotypes by performing an exhaustive cartesian product on prespecified set of categorical and numerical variables.

Parameters

cat_vars : list of python str(s), default=None: List of column names of categorical variables to phenotype on.
num_vars : list of python str(s), default=None: List of column names of continuous variables to phenotype on.
num_vars_quantiles : tuple of floats, default=(0, .5, 1.0): A tuple of quantiles as floats (inclusive of 0 and 1) used to discretize continuous variables into equal-sized bins.
features : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns as covariates.
phenotypes : list: List of lists containing all possible combinations of specified categorical and numerical variable values.

Browse git

Methods

def fit(self, features)

Fit the phenotyper by finding all possible intersectional groups on a passed set of features.

Parameters

features : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

Trained instance of intersectional phenotyper.

Browse git

def predict(self, features)

Phenotype out of sample test data.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:: a numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.

Browse git

def fit_predict(self, features)

Fit and perform phenotyping on a given dataset.

Parameters

features : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:: A numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.

Browse git

class ClusteringPhenotyper (clustering_method='kmeans', dim_red_method=None, random_seed=0, **kwargs)

Phenotyper that performs dimensionality reduction followed by clustering. Learned clusters are considered phenotypes and used to group samples based on similarity in the covariate space.

Parameters

features : pd.DataFrame

A pandas dataframe with rows corresponding to individual samples and columns as covariates.

clustering_method : str, default='kmeans'

The clustering method applied for phenotyping. Options include:

kmeans: K-Means Clustering
dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
gmm: Gaussian Mixture
hierarchical: Agglomerative Clustering

dim_red_method : str, default=None

The dimensionality reductions method applied. Options include:

pca : Principal Component Analysis
kpca : Kernel Principal Component Analysis
nnmf : Non-Negative Matrix Factorization
None : dimensionality reduction is not applied.

random_seed : int, default=0

Controls the randomness and reproducibility of called functions

kwargs : dict

Additional arguments for dimensionality reduction and clustering Please include dictionary key and item pairs specified by the following scikit-learn modules:

pca : sklearn.decomposition.PCA
nnmf : sklearn.decomposition.NMF
kpca : sklearn.decomposition.KernelPCA
kmeans : sklearn.cluster.KMeans
dbscan : sklearn.cluster.DBSCAN
gmm : sklearn.mixture.GaussianMixture
hierarchical : sklearn.cluster.AgglomerativeClustering

Browse git

Methods

def fit(self, features)

Perform dimensionality reduction and train an instance of the clustering algorithm.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

Trained instance of clustering phenotyper.

Browse git

def predict_proba(self, features)

Peform dimensionality reduction, clustering, and estimate probability estimates of sample association to learned clusters, or subgroups.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:: a numpy array of the probability estimates of sample association to learned subgroups.

Browse git

def predict(self, features)

Peform dimensionality reduction, clustering, and extract phenogroups that maximize the probability estimates of sample association to specific learned clusters, or subgroups.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:: a numpy array of phenogroup labels

Browse git

def fit_predict(self, features)

Fit and perform phenotyping on a given dataset.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array: a numpy array of the probability estimates of sample association to learned clusters.

Browse git

class SurvivalVirtualTwinsPhenotyper (cf_method='dcph', phenotyping_method='rfr', cf_hyperparams=None, random_seed=0, **phenotyper_hyperparams)

Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor.

Browse git

Methods

def fit(self, features, outcomes, interventions, horizons, metric)

Fit a counterfactual model and regress the difference of the estimated counterfactual Restricted Mean Survival Time using a Random Forest regressor.

Parameters

features : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
interventions : np.array: Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizons : int or float or list: Event-horizons at which to evaluate model performance.
metric : str, default='ibs': Metric used to evaluate model performance and tune hyperparameters. Options include: - 'auc': Dynamic area under the ROC curve - 'brs' : Brier Score - 'ibs' : Integrated Brier Score - 'ctd' : Concordance Index

Returns

Trained instance of Survival Virtual Twins Phenotyer.

Browse git

def predict_proba(self, features)

Estimate the probability that the Restrictred Mean Survival Time under the Treatment group is greater than that under the control group.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array: a numpy array of the phenogroup probabilties in the format [control_group, treated_group].

Browse git

def predict(self, features)

Extract phenogroups that maximize probability estimates.

Parameters

features : pd.DataFrame: a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array: a numpy array of the phenogroup labels

Browse git

def fit_predict(self, features, outcomes, interventions, horizon)

Fit and perform phenotyping on a given dataset.

Parameters

features : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes : pd.DataFrame: A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
treatment_indicator : np.array: Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizon : np.float: The event horizon at which to compute the counterfacutal RMST for regression.

Returns

np.array: a numpy array of the phenogroup labels.

Browse git