Data Validation
- unite_toolbox.utils.data_validation.validate_array(arr: ArrayLike) numpy.ndarray
Validate array.
Simple function to validate the dimensions and transform into a type suitable for usage in the UNITE toolbox.
Parameters
- arrarray_like
Array like object that can be transformed into a 2d Numpy array
Returns
- arrnumpy.ndarray
Array of shape (n_samples, d_features)
- unite_toolbox.utils.data_validation.find_repeats(data: numpy.ndarray) numpy.ndarray
Find repeats.
Returns a boolean mask for repeat rows in data where True is a repeated row.
Parameters
- dataarray_like
2D array like of shape (n_samples, d_features)
Returns
- masknumpy.ndarray
Boolean array of shape (n_samples,)
- unite_toolbox.utils.data_validation.add_noise_to_data(data: numpy.ndarray) numpy.ndarray
Add noise to repeated rows in data.
Adds Gaussian noise to only the repeated rows in a 2D array. The noise added is one order of magnitude below the order of magnitude of the std. dev. of each specific column in the data. This was empirically determined to be adequate for distance based measures.
Parameters
- dataarray_like
2D array like of shape (n_samples, d_features)
Returns
- noisy_datanumpy.ndarray
2D array of shape (n_samples, d_features)
- unite_toolbox.utils.data_validation.validate_data_kld(a: numpy.ndarray, b: numpy.ndarray) tuple[numpy.ndarray]
Validate data for kNN-based KLD.
Eliminates repeated values from a and the joint array a-b to perform a distance based calculation of KLD, or other method which requires only unique values in two arrays.
The code finds repeats in a, deletes them, and adds back the unique values that were repreated so that not a lot of the data is lost. A similar procedure is applied to b by first concatenating it with a so that no value in a is repeated in b.
Parameters
- anumpy.ndarray
2D array of shape (n_samples, d_features)
- bnumpy.ndarray
2D array of shape (m_samples, d_features)
Returns
- pnumpy.ndarray
2D array of shape (<=n_samples, d_features)
- qnumpy.ndarray
2D array of shape (<=m_samples, d_features)