Module auton_survival.phenotyping
Utilities to phenotype individuals based on similar survival characteristics.
Functions
def random()-
random() -> x in the interval [0, 1).
Classes
class Phenotyper (random_seed=0)-
Base class for all phenotyping methods.
class IntersectionalPhenotyper (cat_vars=None, num_vars=None, num_vars_quantiles=(0, 0.5, 1.0), random_seed=0)-
A phenotyper that phenotypes by performing an exhaustive cartesian product on prespecified set of categorical and numerical variables.
Parameters
cat_vars:listofpython str(s), default=None- List of column names of categorical variables to phenotype on.
num_vars:listofpython str(s), default=None- List of column names of continuous variables to phenotype on.
num_vars_quantiles:tupleoffloats, default=(0, .5, 1.0)- A tuple of quantiles as floats (inclusive of 0 and 1) used to discretize continuous variables into equal-sized bins.
features:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
phenotypes:list- List of lists containing all possible combinations of specified categorical and numerical variable values.
Methods
def fit(self, features)-
Fit the phenotyper by finding all possible intersectional groups on a passed set of features.
Parameters
features:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
Trained instance of intersectional phenotyper.
def predict(self, features)-
Phenotype out of sample test data.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:- a numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
def fit_predict(self, features)-
Fit and perform phenotyping on a given dataset.
Parameters
features:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:- A numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
class ClusteringPhenotyper (clustering_method='kmeans', dim_red_method=None, random_seed=0, **kwargs)-
Phenotyper that performs dimensionality reduction followed by clustering. Learned clusters are considered phenotypes and used to group samples based on similarity in the covariate space.
Parameters
features:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
clustering_method:str, default='kmeans'-
The clustering method applied for phenotyping. Options include:
kmeans: K-Means Clusteringdbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN)gmm: Gaussian Mixturehierarchical: Agglomerative Clustering
dim_red_method:str, default=None-
The dimensionality reductions method applied. Options include:
pca: Principal Component Analysiskpca: Kernel Principal Component Analysisnnmf: Non-Negative Matrix Factorization- None : dimensionality reduction is not applied.
random_seed:int, default=0- Controls the randomness and reproducibility of called functions
kwargs:dict-
Additional arguments for dimensionality reduction and clustering Please include dictionary key and item pairs specified by the following scikit-learn modules:
pca: sklearn.decomposition.PCAnnmf: sklearn.decomposition.NMFkpca: sklearn.decomposition.KernelPCAkmeans: sklearn.cluster.KMeansdbscan: sklearn.cluster.DBSCANgmm: sklearn.mixture.GaussianMixturehierarchical: sklearn.cluster.AgglomerativeClustering
Methods
def fit(self, features)-
Perform dimensionality reduction and train an instance of the clustering algorithm.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
Trained instance of clustering phenotyper.
def predict_proba(self, features)-
Peform dimensionality reduction, clustering, and estimate probability estimates of sample association to learned clusters, or subgroups.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:- a numpy array of the probability estimates of sample association to learned subgroups.
def predict(self, features)-
Peform dimensionality reduction, clustering, and extract phenogroups that maximize the probability estimates of sample association to specific learned clusters, or subgroups.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:- a numpy array of phenogroup labels
def fit_predict(self, features)-
Fit and perform phenotyping on a given dataset.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array- a numpy array of the probability estimates of sample association to learned clusters.
class SurvivalVirtualTwinsPhenotyper (cf_method='dcph', phenotyping_method='rfr', cf_hyperparams=None, random_seed=0, **phenotyper_hyperparams)-
Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor.
Methods
def fit(self, features, outcomes, interventions, horizons, metric)-
Fit a counterfactual model and regress the difference of the estimated counterfactual Restricted Mean Survival Time using a Random Forest regressor.
Parameters
features:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
interventions:np.array- Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizons:intorfloatorlist- Event-horizons at which to evaluate model performance.
metric:str, default='ibs'- Metric used to evaluate model performance and tune hyperparameters. Options include: - 'auc': Dynamic area under the ROC curve - 'brs' : Brier Score - 'ibs' : Integrated Brier Score - 'ctd' : Concordance Index
Returns
Trained instance of Survival Virtual Twins Phenotyer.
def predict_proba(self, features)-
Estimate the probability that the Restrictred Mean Survival Time under the Treatment group is greater than that under the control group.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array- a numpy array of the phenogroup probabilties in the format [control_group, treated_group].
def predict(self, features)-
Extract phenogroups that maximize probability estimates.
Parameters
features:pd.DataFrame- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array- a numpy array of the phenogroup labels
def fit_predict(self, features, outcomes, interventions, horizon)-
Fit and perform phenotyping on a given dataset.
Parameters
features:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes:pd.DataFrame- A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
treatment_indicator:np.array- Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizon:np.float- The event horizon at which to compute the counterfacutal RMST for regression.
Returns
np.array- a numpy array of the phenogroup labels.