Module auton_survival.datasets
Utility functions to load standard datasets to train and evaluate the Deep Survival Machines models.
Functions
def increase_censoring(e, t, p, random_seed=0)
def load_support()
-
Helper function to load and preprocess the SUPPORT dataset. The SUPPORT Dataset comes from the Vanderbilt University study to estimate survival for seriously ill hospitalized adults [1]. Please refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc. for the original datasource.
References
[1]: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.
def load_synthetic_cf_phenotyping()
def load_dataset(dataset='SUPPORT', **kwargs)
-
Helper function to load datasets to test Survival Analysis models. Currently implemented datasets include:
SUPPORT: This dataset comes from the Vanderbilt University study to estimate survival for seriously ill hospitalized adults [1]. (Refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc. for the original datasource.)
PBC: The Primary biliary cirrhosis dataset [2] is well known dataset for evaluating survival analysis models with time dependent covariates.
FRAMINGHAM: This dataset is a subset of 4,434 participants of the well known, ongoing Framingham Heart study [3] for studying epidemiology for hypertensive and arteriosclerotic cardiovascular disease. It is a popular dataset for longitudinal survival analysis with time dependent covariates.
SYNTHETIC: This is a non-linear censored dataset for counterfactual time-to-event phenotyping. Introduced in [4], the dataset is generated such that the treatment effect is heterogenous conditioned on the covariates.
References
[1]: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.
[2] Fleming, Thomas R., and David P. Harrington. Counting processes and survival analysis. Vol. 169. John Wiley & Sons, 2011.
[3] Dawber, Thomas R., Gilcin F. Meadors, and Felix E. Moore Jr. "Epidemiological approaches to heart disease: the Framingham Study." American Journal of Public Health and the Nations Health 41.3 (1951).
[4] Nagpal, C., Goswami M., Dufendach K., and Artur Dubrawski. "Counterfactual phenotyping for censored Time-to-Events" (2022).
Parameters
dataset
:str
- The choice of dataset to load. Currently implemented is 'SUPPORT', 'PBC' and 'FRAMINGHAM'.
**kwargs
:dict
- Dataset specific keyword arguments.
Returns
tuple
:(np.ndarray, np.ndarray, np.ndarray)
- A tuple of the form of (x, t, e) where x are the input covariates, t the event times and e the censoring indicators.