Module auton_survival.phenotyping
Utilities to phenotype individuals based on similar survival characteristics.
Functions
def random()
-
random() -> x in the interval [0, 1).
Classes
class Phenotyper (random_seed=0)
-
Base class for all phenotyping methods.
class IntersectionalPhenotyper (cat_vars=None, num_vars=None, num_vars_quantiles=(0, 0.5, 1.0), random_seed=0)
-
A phenotyper that phenotypes by performing an exhaustive cartesian product on prespecified set of categorical and numerical variables.
Parameters
cat_vars
:list
ofpython str(s)
, default=None
- List of column names of categorical variables to phenotype on.
num_vars
:list
ofpython str(s)
, default=None
- List of column names of continuous variables to phenotype on.
num_vars_quantiles
:tuple
offloats
, default=(0, .5, 1.0)
- A tuple of quantiles as floats (inclusive of 0 and 1) used to discretize continuous variables into equal-sized bins.
features
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
phenotypes
:list
- List of lists containing all possible combinations of specified categorical and numerical variable values.
Methods
def fit(self, features)
-
Fit the phenotyper by finding all possible intersectional groups on a passed set of features.
Parameters
features
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
Trained instance of intersectional phenotyper.
def predict(self, features)
-
Phenotype out of sample test data.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
- a numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
def fit_predict(self, features)
-
Fit and perform phenotyping on a given dataset.
Parameters
features
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
- A numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
class ClusteringPhenotyper (clustering_method='kmeans', dim_red_method=None, random_seed=0, **kwargs)
-
Phenotyper that performs dimensionality reduction followed by clustering. Learned clusters are considered phenotypes and used to group samples based on similarity in the covariate space.
Parameters
features
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
clustering_method
:str
, default='kmeans'
-
The clustering method applied for phenotyping. Options include:
kmeans
: K-Means Clusteringdbscan
: Density-Based Spatial Clustering of Applications with Noise (DBSCAN)gmm
: Gaussian Mixturehierarchical
: Agglomerative Clustering
dim_red_method
:str
, default=None
-
The dimensionality reductions method applied. Options include:
pca
: Principal Component Analysiskpca
: Kernel Principal Component Analysisnnmf
: Non-Negative Matrix Factorization- None : dimensionality reduction is not applied.
random_seed
:int
, default=0
- Controls the randomness and reproducibility of called functions
kwargs
:dict
-
Additional arguments for dimensionality reduction and clustering Please include dictionary key and item pairs specified by the following scikit-learn modules:
pca
: sklearn.decomposition.PCAnnmf
: sklearn.decomposition.NMFkpca
: sklearn.decomposition.KernelPCAkmeans
: sklearn.cluster.KMeansdbscan
: sklearn.cluster.DBSCANgmm
: sklearn.mixture.GaussianMixturehierarchical
: sklearn.cluster.AgglomerativeClustering
Methods
def fit(self, features)
-
Perform dimensionality reduction and train an instance of the clustering algorithm.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
Trained instance of clustering phenotyper.
def predict_proba(self, features)
-
Peform dimensionality reduction, clustering, and estimate probability estimates of sample association to learned clusters, or subgroups.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
- a numpy array of the probability estimates of sample association to learned subgroups.
def predict(self, features)
-
Peform dimensionality reduction, clustering, and extract phenogroups that maximize the probability estimates of sample association to specific learned clusters, or subgroups.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array:
- a numpy array of phenogroup labels
def fit_predict(self, features)
-
Fit and perform phenotyping on a given dataset.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array
- a numpy array of the probability estimates of sample association to learned clusters.
class SurvivalVirtualTwinsPhenotyper (cf_method='dcph', phenotyping_method='rfr', cf_hyperparams=None, random_seed=0, **phenotyper_hyperparams)
-
Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor.
Methods
def fit(self, features, outcomes, interventions, horizons, metric)
-
Fit a counterfactual model and regress the difference of the estimated counterfactual Restricted Mean Survival Time using a Random Forest regressor.
Parameters
features
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
interventions
:np.array
- Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizons
:int
orfloat
orlist
- Event-horizons at which to evaluate model performance.
metric
:str
, default='ibs'
- Metric used to evaluate model performance and tune hyperparameters. Options include: - 'auc': Dynamic area under the ROC curve - 'brs' : Brier Score - 'ibs' : Integrated Brier Score - 'ctd' : Concordance Index
Returns
Trained instance of Survival Virtual Twins Phenotyer.
def predict_proba(self, features)
-
Estimate the probability that the Restrictred Mean Survival Time under the Treatment group is greater than that under the control group.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array
- a numpy array of the phenogroup probabilties in the format [control_group, treated_group].
def predict(self, features)
-
Extract phenogroups that maximize probability estimates.
Parameters
features
:pd.DataFrame
- a pandas dataframe with rows corresponding to individual samples and columns as covariates.
Returns
np.array
- a numpy array of the phenogroup labels
def fit_predict(self, features, outcomes, interventions, horizon)
-
Fit and perform phenotyping on a given dataset.
Parameters
features
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes
:pd.DataFrame
- A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
treatment_indicator
:np.array
- Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizon
:np.float
- The event horizon at which to compute the counterfacutal RMST for regression.
Returns
np.array
- a numpy array of the phenogroup labels.