Different authors have proposed equations to calculate entropy, KL divergence and
mutual information using k-nearest neighbors (k-NN). In principle one could make
an estimate of density \(p(x)\) at \(x_i\) using calc_knn_density
and then use a resubstitution estimate of a desired quantity but this is not recommended
for k-NN-based methods. See more in the tutorial: Non-parametric Density Estimation.
See calc_knn_entropy, calc_knn_kld and calc_knn_mutual_information for details
specific for each equation as well as their source.
Distance is a specific topic which requires a few words. Distance is the length between two points \(x\) and
\(y\), and is given by a \(p\)-norm function where \(p \geq 1\) as follows:
with \(p=2\) suggested for both estimating density, entropy and KL divergence. For mutual
information, Kraskov, et al. propose the usage of \(p=\inf\) or the infinite norm.
Calculate the probability density of x using k-NN.
Calculates the probability density of every point in x based on
the proximity to the data using k-nearest neighbors and the equation
proposed by Wang et al. (2009). Note: x and data must have the
same number of d_features. 10.1109/TIT.2009.2016060
Calculate the (joint) entropy of the input n-d array.
Calculates the (joint) entropy of the input data [in nats] using
the Kozachenko and Leonenko (KL) (1987) estimator which is an approach
based on k-nearest neighbors (k-NN). By default, the Euclidean norm
distance (p-norm = 2) is used to calculate distances.
http://mi.mathnet.ru/ppi797
Calculate the Kullback-Leibler divergence between two arrays of data.
Calculates the Kullback-Leibler divergence (relative entropy) between
two data sets (p and q) [in nats] using the estimation method proposed
by Wang et al. (2009). Both p and q are n-d arrays where d >= 1 which
means they can have multiple features. Typically p represents the true
distribution while q is the approximate distribution. A different
number of total samples in p and q is acceptable,
10.1109/TIT.2009.2016060
Calculate mutual information between x and y using k-NN.
Estimates the mutual information between x and y [in nats] using
the method of estimation proposed Kraskov et al. (2004). Both x and y
can have d >= 1 which means they can have multiple features. By default,
the maximum norm (p-norm = ∞) and fifteen neighbors (k = 15) are used.
10.1103/PhysRevE.69.066138