Module auton_survival.phenotyping
Utilities to phenotype individuals based on similar survival characteristics.
Functions
- def random()
- 
random() -> x in the interval [0, 1). 
Classes
- class Phenotyper (random_seed=0)
- 
Base class for all phenotyping methods. 
- class IntersectionalPhenotyper (cat_vars=None, num_vars=None, num_vars_quantiles=(0, 0.5, 1.0), random_seed=0)
- 
A phenotyper that phenotypes by performing an exhaustive cartesian product on prespecified set of categorical and numerical variables. Parameters- cat_vars:- listof- python str(s), default=- None
- List of column names of categorical variables to phenotype on.
- num_vars:- listof- python str(s), default=- None
- List of column names of continuous variables to phenotype on.
- num_vars_quantiles:- tupleof- floats, default=- (0, .5, 1.0)
- A tuple of quantiles as floats (inclusive of 0 and 1) used to discretize continuous variables into equal-sized bins.
- features:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
- phenotypes:- list
- List of lists containing all possible combinations of specified categorical and numerical variable values.
 Methods- def fit(self, features)
- 
Fit the phenotyper by finding all possible intersectional groups on a passed set of features. Parameters- features:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
 ReturnsTrained instance of intersectional phenotyper. 
- def predict(self, features)
- 
Phenotype out of sample test data. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array:
- a numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
 
- def fit_predict(self, features)
- 
Fit and perform phenotyping on a given dataset. Parameters- features:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array:
- A numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
 
 
- class ClusteringPhenotyper (clustering_method='kmeans', dim_red_method=None, random_seed=0, **kwargs)
- 
Phenotyper that performs dimensionality reduction followed by clustering. Learned clusters are considered phenotypes and used to group samples based on similarity in the covariate space. Parameters- features:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
- clustering_method:- str, default=- 'kmeans'
- 
The clustering method applied for phenotyping. Options include: - kmeans: K-Means Clustering
- dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
- gmm: Gaussian Mixture
- hierarchical: Agglomerative Clustering
 
- dim_red_method:- str, default=- None
- 
The dimensionality reductions method applied. Options include: - pca: Principal Component Analysis
- kpca: Kernel Principal Component Analysis
- nnmf: Non-Negative Matrix Factorization
- None : dimensionality reduction is not applied.
 
- random_seed:- int, default=- 0
- Controls the randomness and reproducibility of called functions
- kwargs:- dict
- 
Additional arguments for dimensionality reduction and clustering Please include dictionary key and item pairs specified by the following scikit-learn modules: - pca: sklearn.decomposition.PCA
- nnmf: sklearn.decomposition.NMF
- kpca: sklearn.decomposition.KernelPCA
- kmeans: sklearn.cluster.KMeans
- dbscan: sklearn.cluster.DBSCAN
- gmm: sklearn.mixture.GaussianMixture
- hierarchical: sklearn.cluster.AgglomerativeClustering
 
 Methods- def fit(self, features)
- 
Perform dimensionality reduction and train an instance of the clustering algorithm. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 ReturnsTrained instance of clustering phenotyper. 
- def predict_proba(self, features)
- 
Peform dimensionality reduction, clustering, and estimate probability estimates of sample association to learned clusters, or subgroups. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array:
- a numpy array of the probability estimates of sample association to learned subgroups.
 
- def predict(self, features)
- 
Peform dimensionality reduction, clustering, and extract phenogroups that maximize the probability estimates of sample association to specific learned clusters, or subgroups. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array:
- a numpy array of phenogroup labels
 
- def fit_predict(self, features)
- 
Fit and perform phenotyping on a given dataset. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array
- a numpy array of the probability estimates of sample association to learned clusters.
 
 
- class SurvivalVirtualTwinsPhenotyper (cf_method='dcph', phenotyping_method='rfr', cf_hyperparams=None, random_seed=0, **phenotyper_hyperparams)
- 
Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor. Methods- def fit(self, features, outcomes, interventions, horizons, metric)
- 
Fit a counterfactual model and regress the difference of the estimated counterfactual Restricted Mean Survival Time using a Random Forest regressor. Parameters- features:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
- outcomes:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
- interventions:- np.array
- Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
- horizons:- intor- floator- list
- Event-horizons at which to evaluate model performance.
- metric:- str, default=- 'ibs'
- Metric used to evaluate model performance and tune hyperparameters. Options include: - 'auc': Dynamic area under the ROC curve - 'brs' : Brier Score - 'ibs' : Integrated Brier Score - 'ctd' : Concordance Index
 ReturnsTrained instance of Survival Virtual Twins Phenotyer. 
- def predict_proba(self, features)
- 
Estimate the probability that the Restrictred Mean Survival Time under the Treatment group is greater than that under the control group. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array
- a numpy array of the phenogroup probabilties in the format [control_group, treated_group].
 
- def predict(self, features)
- 
Extract phenogroups that maximize probability estimates. Parameters- features:- pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
 Returns- np.array
- a numpy array of the phenogroup labels
 
- def fit_predict(self, features, outcomes, interventions, horizon)
- 
Fit and perform phenotyping on a given dataset. Parameters- features:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
- outcomes:- pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
- treatment_indicator:- np.array
- Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
- horizon:- np.float
- The event horizon at which to compute the counterfacutal RMST for regression.
 Returns- np.array
- a numpy array of the phenogroup labels.