Skip to content

Data Sparsity Function

The data sparsity function computes the ratio of invalid values in a sliding window to the total number of values in the window. See n-valid for more details on how valid and invalid values are computed. It can be coupled with the SlidingWindow abstraction to compute the data sparsity feature of a time series. It can be defined as:

\[ \text{sparsity} = \frac{N_{invalid}}{N_{total}} \]

where \(N_{invalid}\) is the number of invalid values in a window \(W\) and \(N_{total}\) is the total number of values in \(W\).

Compute the data sparsity of the array x.

Parameters:

Name Type Description Default
x ndarray

The array to compute the data sparsity of.

required
where Callable[[Union[int, float, int_, float_]], Union[bool, bool_]]

A function that takes a value and returns True or False. Default is lambda x: not np.isnan(x) i.e. a measurement is valid if it is not a NaN value.

lambda : not numpy.isnan(x)

Returns:

Type Description
Union[float, float_]

The data sparsity of measurements in x.

Raises:

Type Description
DivideByZeroError

If x is empty.

Examples

import numpy as np
import autonfeat as aft
import autonfeat.functional as F

# Random data
n_samples = 100
x = np.random.rand(n_samples)

# Create sliding window
ws = 10
ss = 10
window = aft.SlidingWindow(window_size=ws, step_size=ss)

# Get featurizer
featurizer = window.use(F.data_sparsity_tf)

# Get features
features = featurizer(x)

# Print features
print(features)

If you enjoy using AutonFeat, please consider starring the repository ⭐️.