Module `auton_survival.datasets`

Utility functions to load standard datasets to train and evaluate the Deep Survival Machines models.

Functions

def increase_censoring(e, t, p, random_seed=0)

def load_support()

Helper function to load and preprocess the SUPPORT dataset. The SUPPORT Dataset comes from the Vanderbilt University study to estimate survival for seriously ill hospitalized adults [1]. Please refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc. for the original datasource.

References

[1]: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.

Browse git

def load_synthetic_cf_phenotyping()

Browse git

def load_dataset(dataset='SUPPORT', **kwargs)

Helper function to load datasets to test Survival Analysis models. Currently implemented datasets include:

SUPPORT: This dataset comes from the Vanderbilt University study to estimate survival for seriously ill hospitalized adults [1]. (Refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc. for the original datasource.)

PBC: The Primary biliary cirrhosis dataset [2] is well known dataset for evaluating survival analysis models with time dependent covariates.

FRAMINGHAM: This dataset is a subset of 4,434 participants of the well known, ongoing Framingham Heart study [3] for studying epidemiology for hypertensive and arteriosclerotic cardiovascular disease. It is a popular dataset for longitudinal survival analysis with time dependent covariates.

SYNTHETIC: This is a non-linear censored dataset for counterfactual time-to-event phenotyping. Introduced in [4], the dataset is generated such that the treatment effect is heterogenous conditioned on the covariates.

References

[1]: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.

[2] Fleming, Thomas R., and David P. Harrington. Counting processes and survival analysis. Vol. 169. John Wiley & Sons, 2011.

[3] Dawber, Thomas R., Gilcin F. Meadors, and Felix E. Moore Jr. "Epidemiological approaches to heart disease: the Framingham Study." American Journal of Public Health and the Nations Health 41.3 (1951).

[4] Nagpal, C., Goswami M., Dufendach K., and Artur Dubrawski. "Counterfactual phenotyping for censored Time-to-Events" (2022).

Parameters

dataset : str: The choice of dataset to load. Currently implemented is 'SUPPORT', 'PBC' and 'FRAMINGHAM'.
**kwargs : dict: Dataset specific keyword arguments.

Returns

tuple : (np.ndarray, np.ndarray, np.ndarray): A tuple of the form of $(x, t, e)$ where $x$ are the input covariates, $t$ the event times and $e$ the censoring indicators.

Browse git